WO2000039339A1 - Statistical combining of cell expression profiles - Google Patents

Statistical combining of cell expression profiles Download PDF

Info

Publication number
WO2000039339A1
WO2000039339A1 PCT/US1999/030837 US9930837W WO0039339A1 WO 2000039339 A1 WO2000039339 A1 WO 2000039339A1 US 9930837 W US9930837 W US 9930837W WO 0039339 A1 WO0039339 A1 WO 0039339A1
Authority
WO
WIPO (PCT)
Prior art keywords
microarray
fluorophore
pool
cellular constituent
genetic matter
Prior art date
Application number
PCT/US1999/030837
Other languages
English (en)
French (fr)
Inventor
Roland Stoughton
Hongyue Dai
Original Assignee
Rosetta Inpharmatics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rosetta Inpharmatics, Inc. filed Critical Rosetta Inpharmatics, Inc.
Priority to CA2356696A priority Critical patent/CA2356696C/en
Priority to AU23855/00A priority patent/AU774830B2/en
Priority to EP99967594A priority patent/EP1141411A4/en
Priority to JP2000591227A priority patent/JP2002533701A/ja
Publication of WO2000039339A1 publication Critical patent/WO2000039339A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/10Composition for standardization, calibration, simulation, stabilization, preparation or preservation; processes of use in preparation for chemical testing

Definitions

  • the field of this invention relates to methods for using data from multiple repeated experiments to generate a confidence value for each data point, increase sensitivity, and eliminate systematic experimental bias.
  • Cellular constituents include gene expression levels, abundance of mRNA encoding specific genes, and protein expression levels in a biological system.
  • Levels of various constituents of a cell such as mRNA encoding genes and/or protein expression levels, are known to change in response to drug treatments and other perturbations of the cell's biological state. Measurements of a plurality of such "cellular constituents” therefore contain a wealth of information about the affect of perturbations on the cell's biological state. The collection of such measurements is generally referred to as the "profile" of the cell's biological state.
  • LC/MS/MS microcolumn reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry
  • Mortensen et al. describe a method for producing embryonic stem (ES) cell lines whereby both alleles are inactivated by homologous recombination. Using the methods of Mortensen et al, it is possible to obtain homozygous mutationally altered cells, i.e., double knockouts of ES cell lines. Mortensen et al. propose that their method may be generally applicable to other genes and to cell lines other than ES cells. Mortensen et al. 1992, Production of homozygous mutant ES cells with a single targeting construct, Cell Biol. 12:2391-2395. In another promising technology Wach et al.
  • Comparison of profiles with other profiles in a database can give clues to the molecular targets of drugs and related functions, efficacy and toxicity of drug candidates and/or pharmacological agents. Such comparisons may also be used to derive consensus profiles representative of ideal drug activities or disease states. Profile comparison can also help detect diseases in a patient at an early stage and provide improved clinical outcome projections for a patient diagnosed with a disease.
  • the methods of the present invention provide a novel method for fluorophore bias removal. This allows for the attenuation of fluorophore specific biases to acceptable levels based on only two nominal repeats of a cellular constituent quantification experiment.
  • the present invention further provides methods for combining nomimal repeats of a cellular constituent quantification experiment based on rank order of up-regulation or down-regulation. In these methods, cellular constituent up- or down-regulation data determined from nominal repeats of cellular constituent quantification experiments are expressed by a novel metric that is free of intensity dependent errors. Application of this metric before combining based on rank order provides a powerful method for removing error from weakly expressing cellular constituents without an excessive number of nominal repetitions of the expensive cellular constituent quantification experiment.
  • Another aspect of the present invention is an improved method for computing a weighted average of individual cellular constituent measurements in nominally repeated cellular constituent quantification experiments.
  • a novel method for calculating the error associated with each cellular constituent measurement is provided.
  • the error bar in the weighted average is sharply attenuated.
  • these improved methods for computing a weighted average are applicable to two-fluorophore (two-color) or single fluorophore (one-color) protocols.
  • Another embodiment of the invention provides a method for determining a probability that an expression level of a cellular constituent in a plurality of paired differential microarray experiments is altered by a perturbation, wherein each paired differential microarray experiment in said plurality of paired differential microarray experiments comprises a first microarray experiment representing a baseline state of a first biological system, and a second microarray experiment representing a perturbed state of said first biological system, said method comprising the steps of
  • Yet another embodiment of the invention is a method for determining a weighted mean differential intensity in an expression level of a cellular constituent in a biological system in response to a perturbation, the method comprising:
  • step (d) computing the weighted mean differential intensity by inversely weighting each amount of the differential expression of the cellular constituent determined in step (b) by the corresponding amount of error determined in step (c) according to the formula
  • x is the weighted mean differential intensity of the cellular constituent
  • x ( is a differential expression measurement of the cellular constituent i
  • ⁇ ( 2 is a corresponding error for mean differential intensity X j .
  • FIG. 1 depicts some sources of measurement error present in microarray fluorescent images.
  • A depicts unevenly printed DNA probe spots.
  • B depicts the effects of scratches, dust, and artifacts.
  • C depicts how spot positions drift away from a nominal measuring grid.
  • D depicts the effects of unevenness in the brightness across the microarray due to uneven hybridization.
  • E depicts the effects of color stripes on the microarray due to fluorophore- specific biases.
  • Fig. 2 illustrates the effect of deleting genes responsible for the production of calcineurin protein in the yeast S. Cerevisiae (CNA1 and CNA2).
  • the figure contrasts the response profile of two yeast cultures, a native culture (Culture 1) and a culture in which CNA1 and CNA2 have been deleted (Culture 2).
  • the horizontal axis is the log 10 of the intensity of the individual hybridized spots on the microarrary obtained from the two yeast cultures, and therefore represents mRNA species abundance.
  • the vertical axis is the log, 0 of the ratio of the intensity measured for one fluorescent label (Culture 1) to that measured for the other label (Culture 2) (expression ratio).
  • Fig. 3 depicts the intensity-dependent bias that occurs in cell expression profile experiments due to variance in fluorophore optical detection efficiencies as well as variance in fluorophore incorporation efficiencies.
  • Fig. 4A is a color ratio vs. intensity plot for an experiment in which both cultures were the same background strain of the yeast S. Cerevisiae. Genes with a distinct bias between a red and green fluorophore are flagged.
  • Fig. 4B is the same experiment as depicted in Fig. 4A except that usage of the red and green fluorophores is reversed.
  • Fig 4C depicts the bias removal process of the invention, wherein Fig4A and Fig4B are combined to produce a response profile free of fluorophore-specific biases.
  • Fig 5. compares two identical response profiles that were performed under identical experimental conditions. The figure shows that experimental errors decrease as a function of intensity (expression level). Intensity independent contour lines illustrate a component of the error correction methods of the present invention.
  • Fig. 6a shows a typical signature plot for a single experiment with the drug Cyclosporin A.
  • Fig. 6b shows the results of applying a weighted average according to the methods of the present invention to four repeats of the experiment depicted in Fig. 6a.
  • Fig. 7 illustrates a computer system useful for embodiments of the invention.
  • a perturbation is the experimental or environmental condition(s) associated with a biological system. Perturbations may be achieved by exposure of a biological system to a drug candidate or pharmacologic agent, the introduction of an exogenous gene into a biological system, the deletion of a gene from the biological system, changes in the culture conditions of the biological system, or any other art recognized method of perturbing a biological system. Further, perturbation of a biological system may be achieved by the onset of disease in the biological system.
  • genetic matter refers to nucleic acids such as messenger RNA (“mRNA”), complementary DNA (“cDNA”), genomic DNA (“gDNA”), DNA, RNA, genes, oligonucleotides, gene fragments, and any combination thereof.
  • mRNA messenger RNA
  • cDNA complementary DNA
  • gDNA genomic DNA
  • DNA DNA, RNA, genes, oligonucleotides, gene fragments, and any combination thereof.
  • Fluorophore-labeled genetic matter refers to genetic matter that has been labeled with a fluorescently-labeled probe ("fluorophore”).
  • Fluorophores include, but are not limited to, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others ⁇ see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, CA).
  • This DNA may be prepared by reverse transcription of mRNA or by (PCR/IVT) or (IVT) with use of fluorophores as those skilled in the art will appreciate. See e.g. Gelder et al, 1990, "Amplified RNA synthesized from limited quantities of heterogenous cDNA, Proc. Natl. Acad. Sci., USA, 87:1663-1667).
  • PCR refers to the Polymerase Chain Reaction.
  • Biological System is broadly defined to include any cell, tissue, organ or multicellular organism.
  • a biological system can be a cell line, a cell culture, a tissue sample obtained from a subject, a Homo sapien, a mammal, a yeast substantially isogenic to Saccharomyces cerevisia, or any other art recognized biological system.
  • the state of a biological system can be measured by the content, activities or structures of its cellular constituents.
  • the state of a biological system is determined by the state of a collection of cellular constituents, which are sufficient to characterize the cell or organism for an intended purpose including characterizing the effects of a drug or other perturbation.
  • cellular constituent encompasses any kind of measurable biological variable.
  • the measurements and/or observations made on the state of these constituents can be of their abundances ⁇ i.e., amounts or concentrations in a biological system), their activities, their states of modification ⁇ e.g., phosphorylation), or other art recognized measurements relevant to the physiological state of a biological system.
  • this invention includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called aspects of the biological state of a biological system.
  • the transcriptional state of a biological system includes the identities and abundances of the constituent RNA species, especially mRNAs, in the cell under a given set of conditions. Often, a substantial fraction of all constituent RNA species in the biological system are measured, but at least a sufficient fraction is measured to characterize the action of a drug or other perturbation of interest.
  • the transcriptional state of a biological system can be conveniently determined by measuring cDNA abundances by any of several existing gene expression technologies. DNA arrays for measuring mRNA or transcript level of a large number of genes can be employed to ascertain the biological state of a system.
  • the translational state of a biological system includes the identities and abundances of the constituent protein species in the biological system under a given set of conditions. Preferably a substantial fraction of all constituent protein species in the biological system is measured, but at least a sufficient fraction is measured to characterize the action of a drug of interest.
  • the transcriptional state is often representative of the translational state.
  • Other aspects of the biological state of a biological system are also of use in this invention.
  • the activity state of a biological system includes the activities of the constituent protein species (and also optionally catalytically active nucleic acid species) in the biological system under a given set of conditions.
  • the translational state is often representative of the activity state.
  • This invention is also adaptable, where relevant, to "mixed" aspects of the biological state of a biological system in which measurements of different aspects of the biological state of a biological system are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to any other aspect of a biological state of a biological system that is measurable.
  • the biological state of a biological system ⁇ e.g., a cell or cell culture
  • a profile of some number of cellular constituents can be represented by the vector S.
  • S t is the level of the z'th cellular constituent, for example, the transcript level of gene i, or alternatively, the abundance or activity level of protein .
  • Microarrays Determining the relative abundance of diverse individual sequences in complex DNA samples is often accomplished using microarrays. See e.g. Shalon et ⁇ l., 1996, "A Microarray System for Analyzing Complex Samples Using Two-color Fluorescent Probe Hybridization, Genome Research 6:639-645). Frequently, transcript arrays are produced by hybridizing detectably labeled polynucleotides representing the mRNA transcripts present in a cell (e.g., fluorescently labeled cDNA synthesized from total cell mRNA) to a microarray.
  • a cell e.g., fluorescently labeled cDNA synthesized from total cell mRNA
  • a microarray is a surface with an ordered array of binding (e.g., hybridization) sites for products of many of the genes in the genome of a cell or organism, preferably most or almost all of the genes.
  • Microarrays are highly reproducible and therefore multiple copies of a given array can be produced and the nominal copies can be compared with each other.
  • microarrays are small, usually smaller than 5 cm , and made from materials that are stable under binding ⁇ e.g., nucleic acid hybridization) conditions.
  • a given binding site or unique set of binding sites in the microarray will specifically bind the product of a single gene in the cell.
  • the level of hybridization to the site in the array corresponding to any particular gene will reflect the prevalence in the cell of mRNA transcribed from that gene.
  • detectably labeled e.g., with a fluorophore
  • the site on the array corresponding to a gene i.e., capable of specifically binding the product of the gene
  • the site on the array corresponding to a gene i.e., capable of specifically binding the product of the gene
  • a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
  • Microarrays are advantageous because nucleic acids representing two different pools of nucleic acid can be hybridized to a microarray and the relative signal from each pool can simultaneously be measured.
  • Each of pool of nucleic acids may represent the state of a biological system before and after a perturbation.
  • a first nucleic acid pool may be derived from a mRNA pool from a cell culture before exposing the cell culture to a pharmacological agent and a second cDNA pool may be derived from a mRNA pool derived from the same culture after exposing the culture to a pharmacological agent.
  • the two pools of cDNA could represent pathway responses.
  • a first cDNA library could be derived from the mRNA of a first aliquot ("pool") of a cell culture that has been exposed to a pathway perturbation and a second cDNA library can be derived from the mRNA of a second aliquot ("pool") of the same cell culture wherein the second aliquot was not exposed to the pathway perturbation.
  • microarray experiments including those described in this section, are referred to as (“differential microarray experiments").
  • differential microarray experiments One skilled in the art will appreciate that many forms of differential microarray experiments other than the ones outlined in this disclosure are within the scope of the definition of "differential microarray experiments”.
  • a differential intensity measurement refers to measurements made in differential microarray experiments.
  • a differential intensity measurement could be the difference between the brightness of a position on a microarray, which corresponds to a cellular constituent of interest, after (i) the microarray has been contacted with DNA derived from a biological system that represents a baseline state and (ii) the microarray has been contacted with DNA derived from a biological system that represents a perturbed state.
  • the baseline state of a biological system may represent the wild-type state of the biological system.
  • the baseline state of a biological system could represent a different perturbed state of the biological system.
  • Each microarray experiment in a differential microarray experiment, or repeated differential microarray experiment preferably utilizes the same or similar microarray.
  • Microarrays are considered similar if they are prepared from substantially isogenic biological systems and a majority of the binding spots on each microarray are common.
  • the microarray used in repeated microarray experiments may be the same identical microarray, wherein the microarray is washed between microarray experiments, or the microarray (s) used in repeated microarray experiments may be exact replicas of each other, or they may similar to each other.
  • each cDNA pool is distinctively labeled with a different dye if the two-fluorophore microarray format is chosen.
  • each cDNA pool is labeled by deriving fiuorescently-labeled cDNA by reverse transcription of polyA + RNA in the presence of Cy3- (green) or Cy5- (red) deoxynucleotide triphosphates (Amersham).
  • the two cDNAs pools are mixed and hybridized to the microarray, the relative intensity of signal from each cDNA set is determined for each site on the array, and any relative difference in abundance of a particular mRNA detected.
  • the fluorescence emissions at each site of a microarray can be determined using scanning confocal laser microscopy.
  • a separate scan, using the appropriate excitation line is carried out for each of the two fluorophores used.
  • a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (See e.g. Shalon et al, supra).
  • the microarrays may be scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective. Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomultiplier tubes.
  • Fluorescence laser scanning devices are described in Schena et al., 1996, Genome Res. 6:639-645 and in references cited herein.
  • the fiber-optic bundle described by Ferguson et al, 1996, Nature Biotech. 14:1681-1684 may be used to monitor mRNA abundance levels at a large number of sites simultaneously. Signals may be recorded and analyzed by computer, e.g., using a 12 bit analog to digital board.
  • the scanned image may be despeckled using a graphics program ⁇ e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined correction for "cross talk" (or overlap) between the channels for the two fluorophores may be made.
  • microarray experiments refers to the general class of experiments that are described in this section.
  • microarray experiments may include the use of a single fluorophore rather than the two- fluorophore example described infra.
  • microarray experiments may be paired. If paired, the first microarray experiment in the pair could represent a nominal biological system representing a baseline state. The second microarray experiment in the pair could represent the nominal biological system after it has been subjected to a perturbation. Thus comparison of the paired microarray experiment would reveal changes in the state of the nominal biological system based upon the perturbation.
  • these pairs of microarray experiments are referred to as "differential microarray experiments”.
  • Cell Expression Profiles An advantage of using two different cDNA pools in microarray experiments is that a direct and internally controlled comparison of the mRNA levels corresponding to each arrayed gene in two cell states can be made. This and related techniques for quantitative measurement of cellular constitutents is generally referred to as cell constituent profiling.
  • Cell constituent profiling is typically expressed as changes, either in absolute level or the ratio of levels, between two known cell conditions, such as a response to treatment of a baseline state with a pharmacological agent, as described in the previous section.
  • a ratio of the emission of the two fluorophores may be calculated for any particular hybridization site on a DNA transcript array. This ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated by drug administration, gene deletion, or any other perturbation.
  • two- fluorophore cell expression profiles are typically plotted on an x-y graph. The horizontal axis represents the log )0 of the ratio of the mean intensity (which approximately reflects the level of expression of a corresponding mRNA derived from a gene) between the first and second pool of cDNA for each site on the microarray.
  • the vertical axis represents the log, 0 of the ratio of the intensity measured for one fluorescent label, corresponding to the first pool of cDNA, to that measured for the other fluorescent label, corresponding to the second pool of cDNA, for each hybridization site on the microarray.
  • subscripts 1 and 2 refer to two independently extracted mRNA cultures in which abundances are being compared; a,(k) is the abundance of species k in mRNA culture 1; a 2 (k) is the abundance of species k in mRNA culture 2; 10 subscripts X and Y represent the two different fluorescent labels used; and
  • T X/Y is the color ratio that ideally reflects abundance ratio a,/a 2 .
  • Equation (1) ideally represents the measurement plotted on the vertical axis of Figures 2 thru 6.
  • the use of a fluorophore labeled deoxynucleotide triphosphates affects the
  • r ⁇ / y (rev) a 2 (k)E x (k) / a,(k)Ey(k) (3) where r x / ⁇ (rev> is color ratio in the reverse experiment; and a 2 (k), a,(k), E x (k), and E ⁇ (k) are as described for equation (2).
  • Performing hybridization experiments in pairs, with the label assignment reversed in one member of the pair, allows for creation of a combined average measurement in which the fluorophore specific bias is sharply reduced.
  • a pair of two-flourophore hybridization experiments may be performed.
  • the first two-fluorophore experiment would be performed in accordance with equation (2) and the second two-fluorophore hybridization experiments would be performed according to equation (3). If the log of the ratio of the two experiments is taken, the combined experiment can be expressed as:
  • Equation (4) can be written equivalently using ratios as found in equations (l)-(3) instead of differences of log ratios.
  • changes in constituent levels are most appropriately expressed as the logarithm of the ratio of abundance in the pair of conditions forming the differential measurement. This is because fold changes are more meaningful than changes in absolute level, biologically. This method of bias removal is particularly useful in two-color hybridization experiments.
  • Figure 4 illustrates the bias removal method of the present invention.
  • Figure 4a is a color ratio vs. intensity plot for a two-color hybridization experiment in which the two cultures used are nominally the same background strain of the yeast S. Cerevisiae. Because the two cultures are nominally the same, it is expected that individual spots on the microarray would flouresce with the same amount of intensity for both of the fluorophores used. Experimental methods are described in the experimental section infra. However, as is readily apparent from Figure 4a, some of the spots on the microarray exhibit fluorophore- specific intensity. For example, spots on the microarray, corresponding to various genes in the yeast S.
  • FIG. 4b shows the result of the fluorophore-reversed version of the experiment plotted in Figure 4a.
  • the flagged genes in Figure 4b now have opposite bias.
  • Figure 4c shows the result of combining the data of Figures 4a and 4b according to the methods of the present invention described above. The biases of the flagged genes have been greatly reduced.
  • the procedure for bias removal as described above may be applied in other contexts. For example, if cultures must be grown at certain positions in an incubator, and harvested in a certain order, the positions and order for two culture types may be reversed in a subsequent experiment and the results combined as described to reduce subtle biases due to temperature or latency differences.
  • the prior art does not provide a clear method for optimally combining the results of multiple microarray experiments.
  • the results of several experiments could be averaged.
  • averaging does not provide information on the statistical significance of any given measurement for each specific gene of interest in the microarray experiments.
  • This section develops a sophisticated method for determining whether the statistical significance of the up- or down- regulation measured for particular genes of interest in multiple microarray experiments.
  • These methods could be applied to nominal repeats of a two-fluorophore DNA micorarray experiment.
  • these methods could be applied to one or more repeats of pairs of experiments, in which the first experiment in the pair represents a baseline state and the second member of the paired repeats represents a biological state after a perturbation has been applied.
  • This rank combining has the advantage that it does not require any modeling of the detailed error behavior in the underlying hybridization experiments, other than the assumption of no systematic biases.
  • the rank based method is an example of a non-parametric statistical test for the significance of observed up- or down- regulations.
  • Percentile rankings such as equations (5) and (6) are based upon the assumption that the underlying error behavior is similar for all genes. This is not necessarily the case.
  • the weakly expressing genes as expressed by log
  • the weaker expressing a particular gene is, the higher the tendency of the logi 0 (expression ratio) of the gene from two nominal repeats of an experiment to deviate from zero.
  • the low-abundance (weakly expressing and hence low-intensity hybridization) genes will tend to occupy the tails of the distribution of expression ratios (i.e. deviate from zero in accordance with Figure 5) more often than the higher-abundance genes.
  • X and Y are the brightness for a probe spot on the microarray with respect to the X and Y fluorophores
  • is a variance term for X and represents the additive error level in the X channel
  • ⁇ ⁇ 2 is a variance term for Y and represents the additive error level in the Y channel
  • f is the fractional multiplicative error level
  • the first fluorophore (X) may optionally represent a biological system in a base line state whereas the second fluorophore (Y) may represent the biological system in a perturbed state.
  • the fractional multiplicative error, f is empirically derived by fitting the denominator of equation 7 to the measured data.
  • the denominator of Equation (7) is the expected standard error of the numerator, so d has unit variance, d is therefore an error distribution statistic that is independent of intensity, and therefore applicable to rank methods. Any other definition with the non-parametric properties of equation (7) is also a good variable to use in the rank methods.
  • the denominator of equation (7) is used to generate the intensity independent contour lines shown in Figure 5.
  • the choice of using grid lines of ⁇ 1 standard deviation according to the denominator of equation (7) is completely arbitrary.
  • the contour lines could be gridded at any convenient value such as 0.25 ⁇ , 0.5 ⁇ , 2 ⁇ as long as the contour lines are plotted in accordance with the denominator of equation (7) or a similar nonparamatric representation of error.
  • contour lines follow the error envelope.
  • the errors are distributed with respect to the contours similarly at low and at high intensity, and d has the desired property.
  • One advantage of plotting contour lines is that the amount of error associated with each cellular constituent measured on the microarray can be calculated based on information derived from the variance of all the cellular constituents on the microarray across a plurality of measurements.
  • ⁇ x . ⁇ is the standard error (rms uncertainty) associated with that gene. This uncertainty may be derived from repeated control experiments where X and Y are derived from the same biological system, in which case ⁇ x. ⁇ is the observed standard deviation of X- Y for that gene over the set of experiments. This definition of d then is similarly distributed for all genes, and (5) and (6) may be used with ranking d.
  • x is the weighted mean of the cellular constituent being measured
  • x j5 and each ⁇ 2 is the variance of an individual Xj.
  • equation 5-6 in "Data Reduction and Error Analysis for the Physical Sciences", 1969, Bevington, McGraw-Hill, New York, which is incorporated by reference herein in its entirety.
  • Each ⁇ , 2 in equation (8) may be determined in a variety of ways.
  • One approach is to calculate the error envelope for a microarray experiment using two nominal repeats of the two-fluorophore microarray experiment in which the only difference between the two experiments is that the two fluorophores utilized are reversed. See e.g. Fig 4. Alternatively, only one fluorophore could be utilized.
  • Figure 5 also illustrates intensity independent contour lines that are fitted in accordance with the denominator of equation (7).
  • the intensity (x,) is plotted on the appropriate reference plot, such as Figure 5.
  • the intensity of individual measurements would be plotted along the horizontal axis.
  • ⁇ , 2 is calculated based upon the width of the ⁇ l ⁇ intensity independent contour lines at position x, on the reference plot.
  • a general formula for the uncertainty of the mean is
  • Equation (10) transitions from Equation (9) to the value of the observed scatter, s J5 as the number of repeats, N, becomes large.
  • S j is calculated according to traditional statistical methods, such that S] Z N- ⁇ * ⁇ ? ( *, ⁇ x? (11)
  • N is the number of measurements
  • x are individual measurements of the intensity of
  • Figure 6a is the signature plot for a single experiment with the drug CsA, obtained as described in the experimental section infra.
  • the responses of a biological system to a perturbation can be measured by observing the changes in the biological state of the biological system.
  • a response profile is a collection of changes of cellular constituents.
  • the response profile of a biological system ⁇ e.g., a cell or cell culture) to the perturbation m may be defined as the vector v (m) :
  • biological response to the application of a pharmacological agent is measured by the induced change in the transcript level of at least 2 genes, preferably more than 10 genes, more preferably more than 100 genes and most preferably more than 1 ,000 genes.
  • biological response profiles comprise simply the difference between biological variables before and after perturbation.
  • the biological response is defined as the ratio of cellular constituents before and after a n perturbation is applied.
  • v, m is set to zero if the response of gene is below some threshold amplitude or confidence level determined from knowledge of the measurement error behavior. In such embodiments, those cellular constituents whose measured responses are lower than the threshold are given the response value of zero, 5 whereas those cellular constituents whose measured responses are greater than the threshold retain their measured response values.
  • This truncation of the response vector is suitable when most of the smaller responses are expected to be greatly dominated by measurement error. After the truncation, the response vector v (m) also approximates a 'matched detector' ⁇ see, e.g., Van Trees, 1968, Detection. Estimation, and Modulation Theory Vol. I.
  • truncation levels can be set based upon the purpose of detection and the measurement errors. For example, in some embodiments, genes whose transcript level changes are lower than two fold or more preferably four fold are given the value of zero.
  • perturbations are applied at several levels of strength. For example, different amounts of a drug may be applied to a biological system to observe its response.
  • the perturbation responses may be interpolated by approximating each by a single parameterized "model" function of the perturbation strength u.
  • An exemplary model function appropriate for approximating transcriptional state data is the Hill function, which has adjustable parameters a, u 0 , and n.
  • the adjustable parameters are selected independently for each cellular constituent of the perturbation response.
  • the adjustable parameters are selected for each cellular constituent so that the sum of the squares of the differences between the model function ⁇ e.g., the Hill function, Equation 13) and the corresponding experimental data at each perturbation strength is minimized.
  • This preferable parameter adjustment method is known in the art as a least squares fit.
  • Other possible model functions are based on polynomial fitting. More detailed description of model fitting and biological response has been disclosed in Friend and Stoughton, Methods of Determining Protein Activity Levels Using Gene Expression Profiles, U.S. Provisional Application Serial No. 60/084,742, filed on May 8, 1998, which is incorporated herein by reference in it's entirety for all purposes.
  • the methods of the invention are useful for comparing augmented profiles that contain any number of response profile and or projected profiles. Projected profiles are best understood after a discussion of genesets, which are co-regulated genes. Projected profiles are useful for analyzing many types of cellular constituents including genesets.
  • Genes tend to increase or decrease their rates of transcription together when they possess similar regulatory sequence patterns, i.e., transcription factor binding sites. This is the mechanism for coordinated response to particular signaling inputs ⁇ see, e.g., Madhani and Fink, 1998, The riddle of MAP kinase signaling specificity. Transactions in Genetics 14:151-155; Arnone and Davidson, 1997, The hardwiring of development: organization and function of genomic regulatory systems. Development 124:1851-1864). Separate genes which make different components of a necessary protein or cellular structure will tend to co-vary. Duplicated genes ⁇ see, e.g., Wagner, 1996, Genetic redundancy caused by gene duplications and its evolution in networks of transcriptional regulators, Biol.
  • genes will not all vary independently, and that there are simplifying subsets of genes and proteins that will co-vary.
  • These co- varying sets of genes form a complete basis in the mathematical sense with which to describe all the profile changes within that finite set of conditions.
  • a preferred embodiment for identifying such basis genesets involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition. 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy. Freeman; Anderberg, 1973, Cluster Analysis for Applications. Academic Press: New York).
  • clustering algorithms for reviews of clustering algorithms, see, e.g., Fukunaga, 1990, Statistical Pattern Recognition. 2nd Ed., Academic Press, San Diego; Everitt, 1974, Cluster Analysis, London: Heinemann Educ. Books; Hartigan, 1975, Clustering Algorithms, New York: Wiley; Sneath and Sokal, 1973, Numerical Taxonomy. Freeman; Anderberg, 1973, Cluster Analysis for Applications. Academic Press: New York).
  • the expression of a large number of genes is monitored as biological systems are subjected to a wide variety of perturbations.
  • a table of data containing the gene expression measurements is used for cluster analysis.
  • Cluster analysis operates on a table of data which has the dimension m x k wherein m is the total number of conditions or perturbations and k is the number of genes measured.
  • a number of clustering algorithms are useful for clustering analysis. Clustering algorithms use dissimilarities or distances between objects when forming clusters. In some embodiments, the distance used is Euclidean distance in multidimensional space:
  • I(x,y) is the distance between gene X and gene Y;
  • X t and Y are gene expression response under perturbation /.
  • the Euclidean distance may be squared to place progressively greater weight on objects that are further apart.
  • the distance measure may be the Manhattan distance e.g. , between gene X and Y, which is provided by:
  • X, and Y are gene expression responses under perturbation /.
  • Various cluster linkage rules are useful for defining genesets.
  • Single linkage a nearest neighbor method, determines the distance between the two closest objects.
  • complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct "clumps.”
  • the unweighted pair-group average defines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to form naturally distinct "clumps.”
  • the weighted pair- group average method may also be used. This method is the same as the unweighted pair- group average method except that the size of the respective clusters is used as a weight.
  • Cluster linkage rules such as the unweighted and weighted pair-group centroid and Ward's method are also useful for some embodiments of the invention. See., e.g., Ward, 1963, J. Am. Stat Assn. 58:236; Hartigan, 1975, Clustering algorithms. New York: Wiley.
  • Genesets may be defined based on the many smaller branches of a tree, or a small number of larger branches by cutting across the tree at different levels. The choice of cut level may be made to match the number of distinct response pathways expected. If little or no prior information is available about the number of pathways, then the tree should be divided into as many branches as are truly distinct. 'Truly distinct' may be defined by a minimum distance value between the individual branches. Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set.
  • 'truly distinct' may be defined with an objective test of statistical significance for each bifurcation in the tree.
  • the Monte Carlo randomization of the experiment index for each cellular constituent's responses across the set of experiments is used to define an objective test.
  • the objective test is defined in the following manner:
  • D k is the square of the distance measure for constituent k with respect to the center (mean) of its assigned cluster.
  • Superscript 1 or 2 indicates whether it is with respect to the center of the entire branch or with respect to the center of the appropriate cluster out of the two subclusters.
  • D 1 - r , where r is the correlation coefficient between the responses of one constituent across the experiment set vs. the responses of the other (or vs. the mean cluster response).
  • the distribution of fractional improvements obtained from the Monte Carlo procedure is an estimate of the distribution under the null hypothesis that a given branching was not significant.
  • the actual fractional improvement for that branching with the unpermuted data is then compared to the cumulative probability distribution from the null hypothesis to assign significance.
  • Standard deviations are derived by fitting a log normal model for the null hypothesis distribution. Using this procedure, a standard deviation greater than about 2, for example, indicates that the branching is significant at the 95% confidence level.
  • Genesets defined by cluster analysis typically have underlying biological significance.
  • a set of basis vectors V has kx n dimensions, where k is the number of genes and n is the number of genesets.
  • V i ) k is the amplitude contribution of gene index k in basis vector n.
  • V n) k is proportional to the response of gene k in geneset n over the training data set used to define the genesets .
  • the elements V are normalized so that each V n) k has unit length by dividing by the square root of the number of genes in geneset n. This produces basis vectors which are not only orthogonal (the genesets derived from cutting the clustering tree are disjoint), but also orthonormal (unit length). With this choice of normalization, random measurement errors in profiles project onto the V n> k in such a way that the amplitudes tend to be comparable for each n. Normalization prevents large genesets from dominating the results of similarity calculations.
  • Genesets can also be defined based upon the mechanism of the regulation of genes. Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated. In some preferred embodiments, the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (Stormo and Hartzell, 1989, Identifying protein binding sites from unaligned DNA fragments, Proc Natl Acad Sci 86:1 183-1 187; Hertz and Stormo, 1995, Identification of consensus patterns in unaligned DNA and protein sequences: a large- deviation statistical basis for penalizing gaps, Proc of 3rd Intl Conf on Bioinformatics and Genome Research.
  • Co-regulated genes are not limited to those with binding sites for the same transcriptional factor.
  • Co-regulated (co- varying) genes may be in the up-stream/downstream relationship where the products of up-stream genes regulate the activity of down-
  • K-means clustering may be used to cluster genesets when the regulation of genes of interest is partially known. K-means clustering is particularly useful in cases where the number of genesets is predetermined by the
  • K-mean clustering is constrained to produce exactly the number of clusters desired. Therefore, if promoter sequence comparison indicates the measured genes should fall into three genesets, K-means clustering may be used to generate exactly three genesets with greatest possible distinction between clusters.
  • the expression value of genes can be converted into the expression value for genesets. This process is referred to as projection.
  • the projection is as follows:
  • the value of geneset expression is simply the average of the expression value of the genes within the geneset. In some other embodiments, the average is weighted so that highly expressed genes do not dominate the geneset value.
  • the collection of the expression values of the genesets is the projected profile.
  • projected profiles P may be obtained for any set of profiles indexed by /. Similarities between the P, may be more clearly seen than between the original profiles p, for two reasons. First, measurement errors in extraneous genes have been excluded or averaged out. Second, the basis genesets tend to capture the biology of the profiles p, and so are matched detectors for their individual response components.
  • Classification and clustering of the profiles both are based on an objective similarity metric, call it S, where one useful definition is
  • This definition is the generalized angle cosine between the vectors P, and P . It is the projected version of the conventional correlation coefficient between ?, and p Profile p, is deemed most similar to that other profile p ⁇ for which S y is maximum.
  • New profiles may be classified according to their similarity to profiles of known biological significance, such as the response patterns for known drugs or perturbations in specific biological pathways. Sets of new profiles may be clustered using the distance metric
  • the statistical significance of any observed similarity S v may be assessed using an empirical probability distribution generated under the null hypothesis of no correlation. This distribution is generated by performing the projection, Equations (19) and (20) for many different random permutations of the constituent index in the original profile p. That is, the ordered setp k are replaced byp ff(kJ where LT(k) is a permutation, for -100 to 1000 different random permutations. The probability of the similarity S y arising by chance is then the fraction of these permutations for which the similarity S v (permuted) exceeds the similarity observed using the original unpermuted data. 5.7 METHODS FOR DETERMINING BIOLOGICAL RESPONSE PROFILES
  • This section provides some exemplary methods for measuring biological responses as well as the procedures necessary to make the reagents used in such methods.
  • Microarrays are known in the art and consist of a surface to which probes that correspond in sequence to gene products ⁇ e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position.
  • the microarray is an array ⁇ i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (e.g., a protein or RNA), and in which binding sites are present for products of most or almost all of the genes in the organism's genome.
  • the "binding site” is a nucleic acid or nucleic acid analogue to which a particular cognate cDNA can specifically hybridize.
  • the nucleic acid or analogue of the binding site can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
  • the microarray contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required.
  • the microarray will have binding sites corresponding to at least about 50% of the genes in the genome, often at least about 75%, more often at least about 85%, even more often more than about 90%, and most often at least about 99%.
  • the microarray has binding sites for genes relevant to the action of a drug of interest or in a biological pathway of interest.
  • a "gene” is an open reading frame (ORF) of preferably at least 50, 75, or 99 amino acids from which a messenger RNA is transcribed in the organism ⁇ e.g., if a single cell) or in some cell in a multicellular organism.
  • the number of genes in a genome can be estimated from the number of mRNAs expressed by the organism, or by extrapolation from a well-characterized portion of the genome.
  • the number of ORFs can be determined and mRNA coding regions identified by analysis of the DNA sequence.
  • Saccharomyces cerevisiae genome has been completely sequenced and is reported to have approximately 6275 open reading frames (ORFs) longer than 99 amino acids. Analysis of these ORFs indicates that there are 5885 ORFs that are likely to specify protein products (Goffeau et al, 1996, Life with 6000 genes, Science 274:546-567, which is incorporated by reference in its entirety for all purposes).
  • the human genome is estimated to contain approximately 10 5 genes. 5.7.2 PREPARING NUCLEIC ACIDS FOR MICROARRAYS
  • the "binding site" to which a particular cognate cDNA specifically hybridizes is usually a nucleic acid or nucleic acid analogue attached at that binding site.
  • the binding sites of the microarray are DNA poly nucleotides
  • DNAs can be obtained by, e.g., polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA ⁇ e.g., by RT-PCR), or cloned sequences.
  • PCR primers are chosen, based on the known sequence of the genes or cDNA, that result in amplification of unique fragments ⁇ i.e.,
  • each gene fragment on the microarray will be between about 50 bp and about 2000 bp, more typically between about 100 bp and about 1000 bp, and usually between about 300 bp and about 800 bp in length.
  • PCR methods are well known and are described, for example, in Innis et al. eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San
  • nucleic acid for the microarray is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al, 1986, Nucleic Acid Res 14:5399-5407; McBride et al, 1983, Tetrahedron Lett. 24:245-248). Synthetic sequences are between about 15 and about 500
  • nucleic acid analogues may be used as binding sites for hybridization.
  • An example of a suitable nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al, 1993, PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick hydrogen-bonding rules, Nature
  • the binding (hybridization) sites are made from plasmid or phage clones of genes, cDNAs ⁇ e.g., expressed sequence tags), or inserts therefrom (Nguyen et al. , 1995, Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones, Genomics 29:207-209).
  • the polynucleotide of the binding sites is RNA.
  • nucleic acid or analogue are attached to a solid support, which may be made from glass, plastic ⁇ e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials.
  • a preferred method for attaching the nucleic acids to a surface is by printing on 5 glass plates, as is described generally by Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary microarray, Science 270:467-470. This method is especially useful for preparing microarrays of cDNA.
  • a second preferred method for making microarrays is by making high-density oligonucleotide arrays.
  • Techniques are known for producing arrays containing thousands of 15 oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ ⁇ see, Fodor et al., 1991, Light- directed spatially addressable parallel chemical synthesis, Science 251 :767-773; Pease et al, 1994, Light-directed oligonucleotide arrays for rapid DNA sequence analysis, Proc. Natl. Acad. Sci.
  • oligonucleotides ⁇ e.g., 20-mers) 25 of known sequence are synthesized directly on a surface such as a derivatized glass slide.
  • the array produced contains multiple probes against each target transcript.
  • Oligonucleotide probes can be chosen to detect alternatively spliced mRNAs or to serve as various type of control.
  • microarrays Another preferred method of making microarrays is by use of an inkjet printing 30 process to * synthesize oligonucleotides directly on a solid phase, as described, e.g., in co-pending U.S. patent application Serial No. 09/008,120 filed on January 16, 1998, by Blanchard entitled “Chemical Synthesis Using Solvent Microdroplets", which is incorporated by reference herein in its entirety.
  • microarrays e.g., by masking
  • any type of array for example, dot blots on a nylon hybridization membrane ⁇ see Sambrook et al., Molecular Cloning - A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989), could be used, although, as will be recognized by those of skill in the art, very small arrays will be preferred because hybridization volumes will be smaller.
  • RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18:5294-5299).
  • Poly(A)+ RNA is selected by selection with oligo-dT cellulose ⁇ see Sambrook et al, supra).
  • Cells of interest include wild-type cells, drug-exposed wild-type cells, modified cells, and drug-exposed modified cells.
  • Labeled cDNA is prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art ⁇ see, e.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). Reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, most preferably a fluorescently labeled dNTP. Alternatively, isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al.
  • the cDNA or RNA probe can be synthesized in the absence of detectable label and may be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means ⁇ e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin ⁇ e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
  • fluorophores When fluorescently-labeled probes are used, many suitable fluorophores are known, including fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others ⁇ see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, CA). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
  • a label other than a fluorescent label is used.
  • a radioactive label or a pair of radioactive labels with distinct emission spectra, can be used ⁇ see Zhao et al, 1995, High density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression, Gene 156:207; Pietu et al, 1996, Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array, Genome Res. 6:492).
  • use of radioisotopes is a less-preferred embodiment.
  • labeled cDNA is synthesized by incubating a mixture containing 0.5 mM dGTP, dATP and dCTP plus 0.1 mM dTTP plus fluorescent deoxyribonucleotides ⁇ e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)) with reverse transcriptase ⁇ e.g., SuperscriptTM II, LTI Inc.) at 42° C for 60 minutes.
  • fluorescent deoxyribonucleotides ⁇ e.g., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)
  • reverse transcriptase ⁇ e.g., SuperscriptTM II, LTI Inc.
  • Nucleic acid hybridization and wash conditions are optimally chosen so that the probe "specifically binds" or “specifically hybridizes” to a specific array site, i.e., the probe hybridizes, duplexes or binds to a sequence array site with a complementary nucleic acid sequence but does not hybridize to a site with a non-complementary nucleic acid sequence.
  • One polynucleotide sequence is considered complementary to another when, if the shorter of the polynucleotides is less than or equal to 25 bases, there are no mismatches using standard base-pairing rules or, if the shorter of the polynucleotides is longer than 25 bases, there is no more than a 5% mismatch.
  • the polynucleotides are perfectly complementary (no mismatches). It can easily be demonstrated that specific hybridization conditions result in specific hybridization by carrying out a hybridization assay including negative controls ⁇ see, e.g., Shalon et al, supra, and Chee et al, supra). Optimal hybridization conditions will depend on the length ⁇ e.g., oligomer versus polynucleotide greater than 200 bases) and type ⁇ e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide.
  • FIG. 7 illustrates an exemplary computer system suitable for implementation of the analytic methods of this invention.
  • Computer system 501 is illustrated as comprising internal components and being linked to external components.
  • the internal components of this computer system include processor element 502 interconnected with main memory 503.
  • processor element 502 interconnected with main memory 503.
  • computer system 501 can be an Intel 8086-, 80386-, 80486-, Pentium®, or Pentium®-based processor with preferably 32 MB or more of main memory.
  • the external components include mass storage 504.
  • This mass storage can be one or more hard disks (which are typically packaged together with the processor and memory). Such hard disks are preferably of 1 GB or greater storage capacity.
  • Other external components include user interface device 505, which can be a monitor, together with inputing device 506, which can be a "mouse", or other graphic input devices (not illustrated), and/or a keyboard.
  • a printing device 508 can also be attached to the computer 501.
  • computer system 501 is also linked to network link 507, which can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet.
  • network link 507 can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet.
  • This network link allows computer system 501 to share data and processing tasks with other computer systems.
  • Software component 510 represents the operating system, which is responsible for managing computer system 501 and its network interconnections. This operating system can be, for example, of the Microsoft Windows' family, such as Windows 3.1, Windows 95, Windows 98, or Windows NT.
  • Software component 511 represents common languages and functions conveniently present on this system to assist programs implementing the methods specific to this invention. Many high or low level computer languages can be used to program the analytic methods of this invention. Instructions can be interpreted during runtime or compiled.
  • Preferred languages include C/C++, FORTRAN and JAVA®.
  • the methods of this invention are programmed in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including algorithms to be used, thereby freeing a user of the need to procedurally program individual equations or algorithms.
  • Such packages include Matlab from Mathworks (Natick, MA), Mathematica from Wolfram Research (Champaign, IL), or S-Plus from Math Soft (Cambridge, MA).
  • software component 512 and/or 513 represents the analytic methods of this invention as programmed in a procedural language or symbolic package.
  • a user first loads differential microarray experiment data into the computer system 501.
  • a user first loads microarray experiment data into the computer system. This data is loaded into the memory from the storage media (504) or from a remote computer, preferably from a dynamic geneset database system, through the network (507). Next the user causes execution of software that performs the steps of fluorophore bias removal, the rank-based methods of the present invention or the weighted averaging protocols of the present invention.
  • Alternative computer systems and software for implementing the analytic methods of this invention will be apparent to one of skill in the art and are intended to be comprehended within the accompanying claims.
  • the accompanying claims are intended to include the alternative program structures for implementing the methods of this invention that will be readily apparent to one of skill in the art.
  • CsA CsA was added to a concentration of 30 ⁇ g/ml.
  • Cells were broken by standard procedures (see e.g. Ausubel et al, Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (New York), 12.12.1 - 13.12.5) with the following modifications.
  • Cell pellets were resuspended in breaking buffer (0.2M Tris HC1 pH 7.6, 0.5M NaCl, 10 mM EDTA, 1% SDS), vortexed for 2 minutes on a VWR multitube vortexer at setting 8 in the presence of
  • Fluorescently-labeled cDNA was prepared, purified and hybridized essentially as described by DeRisi et al. DeRisi et al. , 1997, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278:680-686. Briefly, Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA during reverse transcription (Superscript II, LTI, 0 Inc.) And purified by concentrating to less than 10 ⁇ l using Microcon-30 microconcentrators (Amicon).
  • Paired cDNAs were resuspended in 20-26 ⁇ l hybridization solution (3x SSC, 0.75 ⁇ g/ml poly A DNA, 0.2% SDS) and applied to the microarray under a 22x30 mm coverslip for 6 hr at 63 °C, all according to DeRisi et al, (1997), supra.
  • PCR products containing common 5' and 3' sequences were used as templates with amino-modified forward primer and unmodified reverse primers to PCR amplify 6065 ORFs from the S. cervisiae genome.
  • First pass success rate was 94%.
  • Amplification reactions that gave products of unexpected sizes were excluded from 0 subsequent analysis.
  • ORFs that could not be amplified from purchased templates were amplified from genomic DNA. DNA samples from 100 ⁇ l reactions were isopropanol precipitated, resuspended in water, brought to 3x SSC in a total volume of 15 ⁇ l, and transferred to 384-well microtiter plates (Genetix).
  • PCR products were spotted into 1x3 inch polylysine-treated glass slides by a robot built according to specifications provided in 5 Schena et al, supra; DeRisi et al, 1996, Discovery and analysis of inflammatory disease- related genes using microarrays. PNAS USA. 94:2150-2155; and DeResi et al, (1997). After printing, slides were processed following published protocols. See DeResi et al, (1997).
  • Microarrays were images on a prototype multi-frame CCD camera in development at
  • Each CCD image frame was approximately 2mm square. Exposure time of 2 sec in the Cy5 channel (white light through Chroma 618-648 nm excitation filter, Chroma 657-727 nm emission filter) and 1 sec in the Cy3 channel (Chroma 535-560 nm excitation filter, Chroma 570-620 nm emission filter) were done consecutively in each fram before moving to the next, spatially contiguous frame. Color isolation between
  • the Cy3 and Cy5 channels was - 100:1 or better. Frames were knitted together in software to make the complete images.
  • the intensity of spots ( ⁇ lOO ⁇ m) were quantified from the 10 ⁇ m pixels by frame background subtraction and intensity averaging in each channel. Dynamic range of the resulting spot intensities was typically a ration of 1000 between the brightest spots and the background-subracted additive error level. Normalization between 5 the channels was accomplished by normalizing each channel to the mean intensities of all genes. This procedure is nearly equivalent to normalization between channels using the intensity ration of genomic DNA spots (See DeRisi et al, 1997) , but is possibly more robust since it is based on the intensities of several thousand spots distributed over the array.
  • x k is the log 10 of the expression ratio for the k'th gene in the x signature
  • y is the logio of the expression ratio for the k'th gene in the y signature.
  • the summation is over those genes that were either up- or down-regulated in either experiment at the 95% confidence level. These genes each had a less than 5% chance of being actually unregulated Q (having expression ratios departing from unity due to measurement errors alone).
  • This confidence level was assigned based on an error model which assigns a lognormal probability distribution to each gene's expression ratio with characteristic width based on the observed scatter in its repeated measurements (repeated arrays at the same nominal experimental conditions) and on the individual array hybridization quality. This latter dependence was derived from control experiments in which both Cy3 and Cy5 samples were derived from the same RNA sample. For large numbers of repeated measurements the error reduces to the observed scatter. For a single measurement the error is based on the array quality and the spot intensity.
  • the 1 ⁇ g/ml FK506 treatment signature was compared to over 40 unrelated deletion
  • End-to-end checks on expression ratio measurement accuracy were provided by analyzing the variance in repeated hybridizations using the same mRNA labeled with both Cy3 and Cy5, and also using Cy3 and Cy5 mRNA samples isolated from independent cultures of the same nominal strain and conditions. Biases undetected with this procedure, such as gene-specific biases presumably due to differential incorporation of Cy3- and Cy5- dUTP into cDNA, were minimized by performing hybridizations in fluorophore-reversed oc pairs, in which the Cy3/Cy5 labeling of the biological conditions was reversed in one experiment with respect to the other. The expression ratio for each gene is then the ratio of ratios between the two experiments in the pair. Other biases are removed by algorithmic numerical detrending.
  • the magnitude of these biases in the absence of detrending and fluorophore reversal is typically on the order of 30% in the ratio, but may be as high as twofold for some ORFs.
  • Expression ratios are based on mean intensities over each spot. The occasional smaller spots have fewer image pixels in the average. This does not degrade accuracy noticeably until the number of pixels falls below ten, in which case the spot is rejected from the data set.
  • Wander of spot positions with respect to the nominal grid is adaptively tracked in array subregions by the image processing software.
  • Unequal spot wander within a subregion greater than half a spot spacing is problematic for the automated quantitating algorithms; in this case the spot is rejected from analysis based on human inspection of the wander. Any spots partially overlapping are excluded from the data set. Less than 1 % of spots typically are rejected for these reasons.
PCT/US1999/030837 1998-12-28 1999-12-27 Statistical combining of cell expression profiles WO2000039339A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA2356696A CA2356696C (en) 1998-12-28 1999-12-27 Statistical combining of cell expression profiles
AU23855/00A AU774830B2 (en) 1998-12-28 1999-12-27 Statistical combining of cell expression profiles
EP99967594A EP1141411A4 (en) 1998-12-28 1999-12-27 STATISTICAL COMBINATION OF ZELLEXPRESSION PROFILES
JP2000591227A JP2002533701A (ja) 1998-12-28 1999-12-27 細胞発現プロファイルの統計的組合せ

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/222,596 1998-12-28
US09/222,596 US6351712B1 (en) 1998-12-28 1998-12-28 Statistical combining of cell expression profiles

Publications (1)

Publication Number Publication Date
WO2000039339A1 true WO2000039339A1 (en) 2000-07-06

Family

ID=22832874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/030837 WO2000039339A1 (en) 1998-12-28 1999-12-27 Statistical combining of cell expression profiles

Country Status (7)

Country Link
US (6) US6351712B1 (US20020128781A1-20020912-P00010.png)
EP (1) EP1141411A4 (US20020128781A1-20020912-P00010.png)
JP (1) JP2002533701A (US20020128781A1-20020912-P00010.png)
CN (1) CN1335893A (US20020128781A1-20020912-P00010.png)
AU (1) AU774830B2 (US20020128781A1-20020912-P00010.png)
CA (1) CA2356696C (US20020128781A1-20020912-P00010.png)
WO (1) WO2000039339A1 (US20020128781A1-20020912-P00010.png)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001029261A2 (en) * 1999-10-15 2001-04-26 E.I. Du Pont De Nemours And Company A method for high-density microarray mediated gene expression profiling
WO2002004676A2 (en) * 2000-07-10 2002-01-17 Incyte Genomics, Inc. Composite and averaged hybridizations
JP2002065259A (ja) * 2000-08-24 2002-03-05 Shinya Watanabe 核酸標識方法および核酸標識用キット
WO2002052038A2 (en) * 2000-12-27 2002-07-04 Geneka Biotechnology Inc. Method for normalizing the relative intensities of detection signals in hybridization arrays
WO2003070938A1 (fr) * 2002-02-21 2003-08-28 Ajinomoto Co., Inc. Analyseur de donnees d'expression genique et procede, programme et support d'enregistrement pour l'analyse des donnees d'expression genique
US6671625B1 (en) 1999-02-22 2003-12-30 Vialogy Corp. Method and system for signal detection in arrayed instrumentation based on quantum resonance interferometry
US6713257B2 (en) 2000-08-25 2004-03-30 Rosetta Inpharmatics Llc Gene discovery using microarrays
EP1447446A1 (en) * 2001-10-22 2004-08-18 Takara Bio Inc. Method of labeling nucleic acids
US6780589B1 (en) 1999-02-22 2004-08-24 Vialogy Corp. Method and system using active signal processing for repeatable signal amplification in dynamic noise backgrounds
EP1625394A2 (en) * 2003-04-23 2006-02-15 Bioseek, Inc. Methods for analysis of biological dataset profiles
EP1719051A1 (en) * 2004-02-27 2006-11-08 Bioseek, Inc. Biological dataset profiling of asthma and atopy
US7418351B2 (en) 2002-01-31 2008-08-26 Rosetta Inpharmatics Llc Methods for analysis of measurement errors in measured signals
US7565251B2 (en) 1998-12-28 2009-07-21 Rosetta Inpharmatics Llc Systems and methods for evaluating the significance of differences in biological measurements
JP2011117978A (ja) * 2001-04-20 2011-06-16 Yale Univ 細胞および組織の自動分析のための方法
US8019552B2 (en) 2004-03-05 2011-09-13 The Netherlands Cancer Institute Classification of breast cancer patients using a combination of clinical criteria and informative genesets
US8105777B1 (en) 2008-02-13 2012-01-31 Nederlands Kanker Instituut Methods for diagnosis and/or prognosis of colon cancer
WO2012135845A1 (en) 2011-04-01 2012-10-04 Qiagen Gene expression signature for wnt/b-catenin signaling pathway and use thereof
US8484000B2 (en) 2004-09-02 2013-07-09 Vialogy Llc Detecting events of interest using quantum resonance interferometry

Families Citing this family (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456942B1 (en) * 1998-01-25 2002-09-24 Combimatrix Corporation Network infrastructure for custom microarray synthesis and analysis
US6990221B2 (en) * 1998-02-07 2006-01-24 Biodiscovery, Inc. Automated DNA array image segmentation and analysis
SE9801420D0 (sv) * 1998-04-22 1998-04-22 Mikael Kubista Metod för karakterisering av enstaka testprover
EP1147229A2 (en) * 1999-02-02 2001-10-24 Bernhard O. Palsson Methods for identifying drug targets based on genomic sequence data
US6731781B1 (en) * 1999-09-30 2004-05-04 Biodiscovery, Inc. System and method for automatically processing microarrays
US7099502B2 (en) * 1999-10-12 2006-08-29 Biodiscovery, Inc. System and method for automatically processing microarrays
KR20030045780A (ko) * 2000-08-03 2003-06-11 어레이젯 리미티드 잉크 젯 프린트헤드에 의한 고도의 병렬 마이크로어레이의제조
US20020107640A1 (en) * 2000-11-14 2002-08-08 Ideker Trey E. Methods for determining the true signal of an analyte
US20030130798A1 (en) * 2000-11-14 2003-07-10 The Institute For Systems Biology Multiparameter integration methods for the analysis of biological networks
US7127379B2 (en) * 2001-01-31 2006-10-24 The Regents Of The University Of California Method for the evolutionary design of biochemical reaction networks
JP2004533037A (ja) * 2001-03-01 2004-10-28 ザ・レジェンツ・オブ・ザ・ユニバーシティ・オブ・カリフォルニア 調節された反応ネットワークの全体的特性を決定するためのモデルおよび方法
AU2002254162A1 (en) * 2001-03-08 2002-09-24 Chromavision Medical Systems, Inc. Apparatus and method for labeling rows and columns in an irregular array
AU2002307486A1 (en) * 2001-04-26 2002-11-11 Rosetta Inpharmatics, Inc. Methods and compositions for utilizing changes of hybridization signals during approach to equilibrium
US20030014232A1 (en) * 2001-05-22 2003-01-16 Paterson Thomas S. Methods for predicting biological activities of cellular constituents
US20030104426A1 (en) * 2001-06-18 2003-06-05 Linsley Peter S. Signature genes in chronic myelogenous leukemia
US6691042B2 (en) 2001-07-02 2004-02-10 Rosetta Inpharmatics Llc Methods for generating differential profiles by combining data obtained in separate measurements
US6768961B2 (en) * 2001-09-14 2004-07-27 Yield Dyamics, Inc. System and method for analyzing error information from a semiconductor fabrication process
US20030073085A1 (en) * 2001-10-05 2003-04-17 Fang Lai Amplifying expressed sequences from genomic DNA of higher-order eukaryotic organisms for DNA arrays
US7751981B2 (en) * 2001-10-26 2010-07-06 The Regents Of The University Of California Articles of manufacture and methods for modeling Saccharomyces cerevisiae metabolism
AU2002350131A1 (en) * 2001-11-09 2003-05-26 Gene Logic Inc. System and method for storage and analysis of gene expression data
WO2003068928A2 (en) * 2002-02-11 2003-08-21 Syngenta Participations Ag Gene function inferring using gene expression data
US20060088831A1 (en) * 2002-03-07 2006-04-27 University Of Utah Research Foundation Methods for identifying large subsets of differentially expressed genes based on multivariate microarray data analysis
US20030224363A1 (en) * 2002-03-19 2003-12-04 Park Sung M. Compositions and methods for modeling bacillus subtilis metabolism
US8229673B2 (en) * 2002-03-29 2012-07-24 Genomatica, Inc. Human metabolic models and methods
US8949032B2 (en) * 2002-03-29 2015-02-03 Genomatica, Inc. Multicellular metabolic models and methods
US7856317B2 (en) * 2002-06-14 2010-12-21 Genomatica, Inc. Systems and methods for constructing genomic-based phenotypic models
WO2004031885A2 (en) * 2002-08-01 2004-04-15 Gene Logic Inc. Method and system for managing and querying gene expression data according to quality
US7512496B2 (en) * 2002-09-25 2009-03-31 Soheil Shams Apparatus, method, and computer program product for determining confidence measures and combined confidence measures for assessing the quality of microarrays
AU2003222214B2 (en) * 2002-10-15 2010-08-12 The Regents Of The University Of California Methods and systems to identify operational reaction pathways
US7869957B2 (en) * 2002-10-15 2011-01-11 The Regents Of The University Of California Methods and systems to identify operational reaction pathways
US7996155B2 (en) 2003-01-22 2011-08-09 Microsoft Corporation ANOVA method for data analysis
JP2004254298A (ja) * 2003-01-30 2004-09-09 Ricoh Co Ltd 画像処理装置、プログラム及び記憶媒体
US8301388B2 (en) * 2003-05-05 2012-10-30 Amplicon Express, Inc. Pool and superpool matrix coding and decoding designs and methods
US20040229226A1 (en) * 2003-05-16 2004-11-18 Reddy M. Parameswara Reducing microarray variation with internal reference spots
JP2006525811A (ja) * 2003-05-16 2006-11-16 ロゼッタ インファーマティクス エルエルシー Rna干渉の方法と組成物
JP2004348674A (ja) * 2003-05-26 2004-12-09 Noritsu Koki Co Ltd 領域検出方法及びその装置
US20050143628A1 (en) * 2003-06-18 2005-06-30 Xudong Dai Methods for characterizing tissue or organ condition or status
US8321137B2 (en) * 2003-09-29 2012-11-27 Pathwork Diagnostics, Inc. Knowledge-based storage of diagnostic models
US20050069863A1 (en) * 2003-09-29 2005-03-31 Jorge Moraleda Systems and methods for analyzing gene expression data for clinical diagnostics
EP2469439A2 (en) 2003-09-29 2012-06-27 Pathwork Diagnostics, Inc. Systems and methods for detecting biological features
JP2005106755A (ja) * 2003-10-01 2005-04-21 Research Organization Of Information & Systems マイクロアレイ実験等から得られるデータの新規解析方法
US7519565B2 (en) * 2003-11-03 2009-04-14 Cloudmark, Inc. Methods and apparatuses for classifying electronic documents
US20050149546A1 (en) * 2003-11-03 2005-07-07 Prakash Vipul V. Methods and apparatuses for determining and designating classifications of electronic documents
US20050094807A1 (en) * 2003-11-04 2005-05-05 John Silzel Accuracy array assay system and method
ES2360113T3 (es) * 2003-12-23 2011-06-01 Genomic Health, Inc. Amplificación universal de rna fragmentado.
US7454293B2 (en) * 2004-01-07 2008-11-18 University Of Hawai'i Methods for enhanced detection and analysis of differentially expressed genes using gene chip microarrays
US7881872B2 (en) * 2004-03-12 2011-02-01 Microsoft Corporation Methods of analyzing multi-channel profiles
US7660709B2 (en) * 2004-03-18 2010-02-09 Van Andel Research Institute Bioinformatics research and analysis system and methods associated therewith
US7653260B2 (en) * 2004-06-17 2010-01-26 Carl Zeis MicroImaging GmbH System and method of registering field of view
US8582924B2 (en) * 2004-06-30 2013-11-12 Carl Zeiss Microimaging Gmbh Data structure of an image storage and retrieval system
US7542854B2 (en) * 2004-07-22 2009-06-02 International Business Machines Corporation Method for discovering gene regulatory models and genetic networks using relational fuzzy models
EP1910536B1 (en) * 2005-07-26 2009-09-09 Council Of Scientific And Industrial Research Methods for identifying genes that increase yeast stress tolerance, and use of these genes for yeast strain improvement
US7437249B2 (en) * 2006-06-30 2008-10-14 Agilent Technologies, Inc. Methods and systems for detrending signal intensity data from chemical arrays
WO2008115497A2 (en) * 2007-03-16 2008-09-25 Gene Security Network System and method for cleaning noisy genetic data and determining chromsome copy number
US20090023182A1 (en) * 2007-07-18 2009-01-22 Schilling Christophe H Complementary metabolizing organisms and methods of making same
US20100292093A1 (en) * 2007-10-18 2010-11-18 Rubinstein Wendy S Breast cancer profiles and methods of use thereof
CN101250584B (zh) * 2008-03-19 2012-06-13 南京大学 一种识别显著差异表达基因集合的方法
US8086502B2 (en) 2008-03-31 2011-12-27 Ebay Inc. Method and system for mobile publication
US7991646B2 (en) 2008-10-30 2011-08-02 Ebay Inc. Systems and methods for marketplace listings using a camera enabled mobile device
US8404242B2 (en) 2009-03-16 2013-03-26 Atyr Pharma, Inc. Compositions and methods comprising histidyl-tRNA synthetase splice variants having non-canonical biological activities
US8825660B2 (en) * 2009-03-17 2014-09-02 Ebay Inc. Image-based indexing in a network-based marketplace
WO2010120509A2 (en) 2009-03-31 2010-10-21 Atyr Pharma, Inc. Compositions and methods comprising aspartyl-trna synthetases having non-canonical biological activities
US8861844B2 (en) * 2010-03-29 2014-10-14 Ebay Inc. Pre-computing digests for image similarity searching of image-based listings in a network-based publication system
US8949252B2 (en) 2010-03-29 2015-02-03 Ebay Inc. Product category optimization for image similarity searching of image-based listings in a network-based publication system
US8819052B2 (en) 2010-03-29 2014-08-26 Ebay Inc. Traffic driver for suggesting stores
US9792638B2 (en) 2010-03-29 2017-10-17 Ebay Inc. Using silhouette images to reduce product selection error in an e-commerce environment
US9405773B2 (en) * 2010-03-29 2016-08-02 Ebay Inc. Searching for more products like a specified product
JP6066900B2 (ja) 2010-04-26 2017-01-25 エータイアー ファーマ, インコーポレイテッド システイニルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
AU2011248614B2 (en) 2010-04-27 2017-02-16 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of isoleucyl tRNA synthetases
CA2797271C (en) 2010-04-28 2021-05-25 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of alanyl trna synthetases
EP2563383B1 (en) 2010-04-29 2017-03-01 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of valyl trna synthetases
WO2011139854A2 (en) 2010-04-29 2011-11-10 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of asparaginyl trna synthetases
WO2011135459A2 (en) 2010-04-29 2011-11-03 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy
CN103140233B (zh) 2010-05-03 2017-04-05 Atyr 医药公司 与甲硫氨酰‑tRNA合成酶的蛋白片段相关的治疗、诊断和抗体组合物的发现
US9034321B2 (en) 2010-05-03 2015-05-19 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of phenylalanyl-alpha-tRNA synthetases
CA2797277C (en) 2010-05-03 2021-02-23 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of arginyl-trna synthetases
CN102985103A (zh) 2010-05-04 2013-03-20 Atyr医药公司 与p38多-tRNA合成酶复合物相关的治疗、诊断和抗体组合物的创新发现
JP6396656B2 (ja) 2010-05-14 2018-09-26 エータイアー ファーマ, インコーポレイテッド フェニルアラニルβtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
US9034598B2 (en) 2010-05-17 2015-05-19 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of leucyl-tRNA synthetases
CN103096913B (zh) 2010-05-27 2017-07-18 Atyr 医药公司 与谷氨酰胺酰‑tRNA合成酶的蛋白片段相关的治疗、诊断和抗体组合物的创新发现
AU2011261486B2 (en) 2010-06-01 2017-02-23 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of lysyl-tRNA synthetases
US8407221B2 (en) 2010-07-09 2013-03-26 International Business Machines Corporation Generalized notion of similarities between uncertain time series
JP6116479B2 (ja) 2010-07-12 2017-04-19 エータイアー ファーマ, インコーポレイテッド グリシルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
WO2012027611A2 (en) 2010-08-25 2012-03-01 Atyr Pharma, Inc. INNOVATIVE DISCOVERY OF THERAPEUTIC, DIAGNOSTIC, AND ANTIBODY COMPOSITIONS RELATED TO PROTEIN FRAGMENTS OF TYROSYL-tRNA SYNTHETASES
US8412594B2 (en) 2010-08-28 2013-04-02 Ebay Inc. Multilevel silhouettes in an online shopping environment
EP2714927B1 (en) 2011-06-01 2016-08-10 Medical Prognosis Institute A/S Methods and devices for prognosis of cancer relapse
WO2013123432A2 (en) 2012-02-16 2013-08-22 Atyr Pharma, Inc. Histidyl-trna synthetases for treating autoimmune and inflammatory diseases
US9934522B2 (en) 2012-03-22 2018-04-03 Ebay Inc. Systems and methods for batch- listing items stored offline on a mobile device
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
BR112015012239B1 (pt) 2012-11-27 2022-07-19 Pontificia Universidad Católica De Chile Método in vitro de diagnóstico de câncer de tireoide
WO2014172390A2 (en) 2013-04-15 2014-10-23 Cedars-Sinai Medical Center Methods for detecting cancer metastasis
US10392667B2 (en) 2013-06-07 2019-08-27 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy of fulvestrant in cancer patients
EP3074039A4 (en) 2013-11-26 2017-10-11 The Brigham and Women's Hospital, Inc. Compositions and methods for modulating an immune response
DK3169815T3 (da) 2014-07-15 2021-02-15 Ontario Institute For Cancer Res Fremgangsmåder og indretninger til forudsigelse af anthracyclinbehandlingseffektivitet
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US20180165424A1 (en) * 2016-12-14 2018-06-14 Exxonmobil Research And Engineering Company Method for dynamic bias management between online process analyzers and referee tests

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5552270A (en) * 1991-03-18 1996-09-03 Institut Molekulyarnoi Biologii Imeni V.A. Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same
US5965352A (en) * 1998-05-08 1999-10-12 Rosetta Inpharmatics, Inc. Methods for identifying pathways of drug action

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US130215A (en) * 1872-08-06 Improvement in steam and air brakes
US164273A (en) * 1875-06-08 Improvement in hand-stamps
US5800992A (en) 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
US5143854A (en) 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
US5155916A (en) * 1991-03-21 1992-10-20 Scientific Drilling International Error reduction in compensation of drill string interference for magnetic survey tools
US5807522A (en) 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
US5777888A (en) 1995-08-09 1998-07-07 Regents Of The University Of California Systems for generating and analyzing stimulus-response output signal matrices
US5569588A (en) 1995-08-09 1996-10-29 The Regents Of The University Of California Methods for drug screening
JP2002515738A (ja) * 1996-01-23 2002-05-28 アフィメトリックス,インコーポレイティド 核酸分析法
US6165709A (en) * 1997-02-28 2000-12-26 Fred Hutchinson Cancer Research Center Methods for drug target screening
PT1078256E (pt) * 1998-04-22 2003-04-30 Imaging Res Inc Processo para avaliar ensaios quimicos e biologicos
US6218122B1 (en) * 1998-06-19 2001-04-17 Rosetta Inpharmatics, Inc. Methods of monitoring disease states and therapies using gene expression profiles
US6171794B1 (en) 1998-07-13 2001-01-09 Rosetta Inpharmatics, Inc. Methods for determining cross-hybridization
US6174794B1 (en) 1998-08-20 2001-01-16 Advanced Micro Devices, Inc. Method of making high performance MOSFET with polished gate and source/drain feature
US6146830A (en) 1998-09-23 2000-11-14 Rosetta Inpharmatics, Inc. Method for determining the presence of a number of primary targets of a drug
US6245517B1 (en) 1998-09-29 2001-06-12 The United States Of America As Represented By The Department Of Health And Human Services Ratio-based decisions and the quantitative analysis of cDNA micro-array images
US6203987B1 (en) * 1998-10-27 2001-03-20 Rosetta Inpharmatics, Inc. Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
US6950752B1 (en) 1998-10-27 2005-09-27 Rosetta Inpharmatics Llc Methods for removing artifact from biological profiles
US6453241B1 (en) * 1998-12-23 2002-09-17 Rosetta Inpharmatics, Inc. Method and system for analyzing biological response signal data
US6351712B1 (en) 1998-12-28 2002-02-26 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles
US6230987B1 (en) * 2000-05-23 2001-05-15 Hai Quang Truong Applicators for allowing a predetermined fluid flow for dissolving and distributing soluble substances

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5552270A (en) * 1991-03-18 1996-09-03 Institut Molekulyarnoi Biologii Imeni V.A. Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same
US5965352A (en) * 1998-05-08 1999-10-12 Rosetta Inpharmatics, Inc. Methods for identifying pathways of drug action

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1141411A4 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521441B2 (en) 1998-12-28 2013-08-27 Microsoft Corporation Method and computer program product for reducing fluorophore-specific bias
US7966130B2 (en) 1998-12-28 2011-06-21 Microsoft Corporation Systems and methods for determining a weighted mean intensity
US7565251B2 (en) 1998-12-28 2009-07-21 Rosetta Inpharmatics Llc Systems and methods for evaluating the significance of differences in biological measurements
US6671625B1 (en) 1999-02-22 2003-12-30 Vialogy Corp. Method and system for signal detection in arrayed instrumentation based on quantum resonance interferometry
US6780589B1 (en) 1999-02-22 2004-08-24 Vialogy Corp. Method and system using active signal processing for repeatable signal amplification in dynamic noise backgrounds
WO2001029261A3 (en) * 1999-10-15 2002-05-23 Du Pont A method for high-density microarray mediated gene expression profiling
WO2001029261A2 (en) * 1999-10-15 2001-04-26 E.I. Du Pont De Nemours And Company A method for high-density microarray mediated gene expression profiling
US6607885B1 (en) 1999-10-15 2003-08-19 E. I. Du Pont De Nemours And Company Method for high-density microarray medicated gene expression profiling
WO2002004676A3 (en) * 2000-07-10 2005-04-28 Incyte Genomics Inc Composite and averaged hybridizations
WO2002004676A2 (en) * 2000-07-10 2002-01-17 Incyte Genomics, Inc. Composite and averaged hybridizations
JP2002065259A (ja) * 2000-08-24 2002-03-05 Shinya Watanabe 核酸標識方法および核酸標識用キット
US6713257B2 (en) 2000-08-25 2004-03-30 Rosetta Inpharmatics Llc Gene discovery using microarrays
WO2002052038A3 (en) * 2000-12-27 2003-01-16 Geneka Biotechnology Inc Method for normalizing the relative intensities of detection signals in hybridization arrays
WO2002052038A2 (en) * 2000-12-27 2002-07-04 Geneka Biotechnology Inc. Method for normalizing the relative intensities of detection signals in hybridization arrays
JP2011117978A (ja) * 2001-04-20 2011-06-16 Yale Univ 細胞および組織の自動分析のための方法
EP1447446A1 (en) * 2001-10-22 2004-08-18 Takara Bio Inc. Method of labeling nucleic acids
EP1447446A4 (en) * 2001-10-22 2006-12-20 Takara Bio Inc PROCESS FOR MARKING NUCLEIC ACIDS
US7418351B2 (en) 2002-01-31 2008-08-26 Rosetta Inpharmatics Llc Methods for analysis of measurement errors in measured signals
WO2003070938A1 (fr) * 2002-02-21 2003-08-28 Ajinomoto Co., Inc. Analyseur de donnees d'expression genique et procede, programme et support d'enregistrement pour l'analyse des donnees d'expression genique
EP1625394A4 (en) * 2003-04-23 2008-02-06 Bioseek Inc METHOD FOR ANALYZING BIOLOGICAL DATA PROFILES
EP1625394A2 (en) * 2003-04-23 2006-02-15 Bioseek, Inc. Methods for analysis of biological dataset profiles
EP1719051A4 (en) * 2004-02-27 2009-06-17 Bioseek Inc BIOLOGICAL RECORDING OF ASTHMA AND ATOPY
EP1719051A1 (en) * 2004-02-27 2006-11-08 Bioseek, Inc. Biological dataset profiling of asthma and atopy
US8019552B2 (en) 2004-03-05 2011-09-13 The Netherlands Cancer Institute Classification of breast cancer patients using a combination of clinical criteria and informative genesets
US8484000B2 (en) 2004-09-02 2013-07-09 Vialogy Llc Detecting events of interest using quantum resonance interferometry
US8105777B1 (en) 2008-02-13 2012-01-31 Nederlands Kanker Instituut Methods for diagnosis and/or prognosis of colon cancer
WO2012135845A1 (en) 2011-04-01 2012-10-04 Qiagen Gene expression signature for wnt/b-catenin signaling pathway and use thereof

Also Published As

Publication number Publication date
US8521441B2 (en) 2013-08-27
CA2356696A1 (en) 2000-07-06
US20050130215A1 (en) 2005-06-16
US7966130B2 (en) 2011-06-21
EP1141411A4 (en) 2007-05-02
EP1141411A1 (en) 2001-10-10
CN1335893A (zh) 2002-02-13
US20060190191A1 (en) 2006-08-24
AU774830B2 (en) 2004-07-08
US6351712B1 (en) 2002-02-26
US7565251B2 (en) 2009-07-21
CA2356696C (en) 2011-08-02
US20030093227A1 (en) 2003-05-15
US20020128781A1 (en) 2002-09-12
US20050164273A1 (en) 2005-07-28
AU2385500A (en) 2000-07-31
JP2002533701A (ja) 2002-10-08

Similar Documents

Publication Publication Date Title
CA2356696C (en) Statistical combining of cell expression profiles
US7897750B2 (en) Strategies for gene expression analysis
Deyholos et al. High‐density microarrays for gene expression analysis
ZA200103848B (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns.
US20050214824A1 (en) Methods for monitoring the expression of alternatively spliced genes
Burgess Gene expression studies using microarrays
WO2000024936A1 (en) Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
JP2016165286A (ja) 転写物測定値数が減少した、遺伝子発現プロファイリング
US20230416826A1 (en) Target-enriched multiplexed parallel analysis for assessment of fetal dna samples
US20060281126A1 (en) Methods for monitoring the expression of alternatively spliced genes
US7371516B1 (en) Methods for determining the specificity and sensitivity of oligonucleo tides for hybridization
EP1200625A1 (en) Methods for determining the specificity and sensitivity of oligonucleotides for hybridization
EP1141415A1 (en) Methods for robust discrimination of profiles
KR20010081098A (ko) 유전자 발현 패턴의 탐지 및 분류를 강화하기 위한공통-조절 유전자세트를 이용하는 방법
Lockhart et al. DNA arrays and gene expression analysis in the brain
WO2017120750A1 (zh) 一种针对东亚人群全基因组范围内的非编码区的SNPs的DNA芯片
WO2007087302A2 (en) Oligonucleotide matrix and methods of use
Compton et al. Gene Expression Profiling

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99816329.5

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref document number: 2356696

Country of ref document: CA

Ref document number: 2356696

Country of ref document: CA

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 23855/00

Country of ref document: AU

ENP Entry into the national phase

Ref document number: 2000 591227

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: IN/PCT/2001/988/CHE

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 1999967594

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999967594

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWG Wipo information: grant in national office

Ref document number: 23855/00

Country of ref document: AU