US20020128781A1 - Statistical combining of cell expression profiles - Google Patents

Statistical combining of cell expression profiles Download PDF

Info

Publication number
US20020128781A1
US20020128781A1 US10/058,696 US5869602A US2002128781A1 US 20020128781 A1 US20020128781 A1 US 20020128781A1 US 5869602 A US5869602 A US 5869602A US 2002128781 A1 US2002128781 A1 US 2002128781A1
Authority
US
United States
Prior art keywords
microarray
fluorophore
pool
cellular constituent
genetic matter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/058,696
Other languages
English (en)
Inventor
Roland Stoughton
Hongyue Dai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rosetta Inpharmatics LLC
Microsoft Technology Licensing LLC
Original Assignee
Rosetta Inpharmatics LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rosetta Inpharmatics LLC filed Critical Rosetta Inpharmatics LLC
Priority to US10/058,696 priority Critical patent/US20020128781A1/en
Publication of US20020128781A1 publication Critical patent/US20020128781A1/en
Priority to US10/287,130 priority patent/US7966130B2/en
Priority to US11/042,653 priority patent/US20050130215A1/en
Priority to US11/042,654 priority patent/US8521441B2/en
Priority to US11/303,121 priority patent/US7565251B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10TTECHNICAL SUBJECTS COVERED BY FORMER US CLASSIFICATION
    • Y10T436/00Chemistry: analytical and immunological testing
    • Y10T436/10Composition for standardization, calibration, simulation, stabilization, preparation or preservation; processes of use in preparation for chemical testing

Definitions

  • the field of this invention relates to methods for using data from multiple repeated experiments to generate a confidence value for each data point, increase sensitivity, and eliminate systematic experimental bias.
  • Cellular constituents include gene expression levels, abundance of mRNA encoding specific genes, and protein expression levels in a biological system.
  • Levels of various constituents of a cell such as mRNA encoding genes and/or protein expression levels, are known to change in response to drug treatments and other perturbations of the cell's biological state. Measurements of a plurality of such “cellular constituents” therefore contain a wealth of information about the affect of perturbations on the cell's biological state. The collection of such measurements is generally referred to as the “profile” of the cell's biological state.
  • LC/MS/MS microcolumn reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry
  • Mortensen et al. describe a method for producing embryonic stem (ES) cell lines whereby both alleles are inactivated by homologous recombination. Using the methods of Mortensen et al., it is possible to obtain homozygous mutationally altered cells, i.e., double knockouts of ES cell lines. Mortensen et al. propose that their method may be generally applicable to other genes and to cell lines other than ES cells. Mortensen et al. 1992, Production of homozygous mutant ES cells with a single targeting construct, Cell Biol. 12:2391-2395.
  • Comparison of profiles with other profiles in a database can give clues to the molecular targets of drugs and related functions, efficacy and toxicity of drug candidates and/or pharmacological agents. Such comparisons may also be used to derive consensus profiles representative of ideal drug activities or disease states. Profile comparison can also help detect diseases in a patient at an early stage and provide improved clinical outcome projections for a patient diagnosed with a disease.
  • This invention provides solutions for minimizing the number of times a cellular constituent quantification experiment must be repeated in order to produce data that has acceptable error levels. Accordingly, the methods of the present invention provide a novel method for fluorophore bias removal. This allows for the attenuation of fluorophore specific biases to acceptable levels based on only two nominal repeats of a cellular constituent quantification experiment.
  • the present invention farther provides methods for combining nomimal repeats of a cellular constituent quantification experiment based on rank order of up-regulation or down-regulation. In these methods, cellular constituent up- or down-regulation data determined from nominal repeats of cellular constituent quantification experiments are expressed by a novel metric that is free of intensity dependent errors. Application of this metric before combining based on rank order provides a powerful method for removing error from weakly expressing cellular constituents without an excessive number of nominal repetitions of the expensive cellular constituent quantification experiment.
  • Another aspect of the present invention is an improved method for computing a weighted average of individual cellular constituent measurements in nominally repeated cellular constituent quantification experiments.
  • a novel method for calculating the error associated with each cellular constituent measurement is provided.
  • the error bar in the weighted average is sharply attenuated.
  • these improved methods for computing a weighted average are applicable to two-fluorophore (two-color) or single fluorophore (one-color) protocols.
  • One embodiment of the present invention provides a method of fluorophore bias removal comprising the steps of:
  • Another embodiment of the invention provides a method for determining a probability that an expression level of a cellular constituent in a plurality of paired differential microarray experiments is altered by a perturbation, wherein each paired differential microarray experiment in said plurality of paired differential microarray experiments comprises a first microarray experiment representing a baseline state of a first biological system, and a second microarray experiment representing a perturbed state of said first biological system, said method comprising the steps of
  • step (c) determining said probability that said expression level of said cellular constituent in said plurality of paired differential microarray experiments is altered by said perturbation by combining said amount of change in expression level of said cellular constituent determined in step (b) for each paired differential microarray experiment in said plurality of paired differential microarray experiments using a rank based method.
  • Yet another embodiment of the invention is a method for determining a weighted mean differential intensity in an expression level of a cellular constituent in a biological system in response to a perturbation, the method comprising:
  • x is the weighted mean differential intensity of the cellular constituent
  • x i is a differential expression measurement of the cellular constituent i
  • ⁇ i 2 is a corresponding error for mean differential intensity x i .
  • FIG. 1 depicts some sources of measurement error present in microarray fluorescent images.
  • A depicts unevenly printed DNA probe spots.
  • B depicts the effects of scratches, dust, and artifacts.
  • C depicts how spot positions drift away from a nominal measuring grid.
  • D depicts the effects of unevenness in the brightness across the microarray due to uneven hybridization.
  • E depicts the effects of color stripes on the microarray due to fluorophore-specific biases.
  • FIG. 2 illustrates the effect of deleting genes responsible for the production of calcineurin protein in the yeast S. Cerevisiae (CNA1 and CNA2).
  • the figure contrasts the response profile of two yeast cultures, a native culture (Culture 1) and a culture in which CNA1 and CNA2 have been deleted (Culture 2).
  • the horizontal axis is the log 10 of the intensity of the individual hybridized spots on the microarrary obtained from the two yeast cultures, and therefore represents mRNA species abundance.
  • the vertical axis is the log 10 of the ratio of the intensity measured for one fluorescent label (Culture 1) to that measured for the other label (Culture 2) (expression ratio).
  • FIG. 3 depicts the intensity-dependent bias that occurs in cell expression profile experiments due to variance in fluorophore optical detection efficiencies as well as variance in fluorophore incorporation efficiencies.
  • FIG. 4A is a color ratio vs. intensity plot for an experiment in which both cultures were the same background strain of the yeast S. Cerevisiae . Genes with a distinct bias between a red and green fluorophore are flagged.
  • FIG. 4B is the same experiment as depicted in FIG. 4A except that usage of the red and green fluorophores is reversed.
  • FIG. 4C depicts the bias removal process of the invention, wherein FIG. 4A and FIG. 4B are combined to produce a response profile free of fluorophore-specific biases.
  • FIG. 5. compares two identical response profiles that were performed under identical experimental conditions. The figure shows that experimental errors decrease as a function of intensity (expression level). Intensity independent contour lines illustrate a component of the error correction methods of the present invention.
  • FIG. 6 a shows a typical signature plot for a single experiment with the drug Cyclosporin A.
  • FIG. 6 b shows the results of applying a weighted average according to the methods of the present invention to four repeats of the experiment depicted in FIG. 6 a.
  • FIG. 7 illustrates a computer system useful for embodiments of the invention.
  • Perturbation is the experimental or environmental condition(s) associated with a biological system. Perturbations may be achieved by exposure of a biological system to a drug candidate or pharmacologic agent, the introduction of an exogenous gene into a biological system, the deletion of a gene from the biological system, changes in the culture conditions of the biological system, or any other art recognized method of perturbing a biological system. Further, perturbation of a biological system may be achieved by the onset of disease in the biological system.
  • Genetic Matter refers to nucleic acids such as messenger RNA (“mRNA”), complementary DNA (“cDNA”), genomic DNA (“gDNA”), DNA, RNA, genes, oligonucleotides, gene fragments, and any combination thereof.
  • mRNA messenger RNA
  • cDNA complementary DNA
  • gDNA genomic DNA
  • DNA DNA
  • RNA RNA
  • genes oligonucleotides
  • gene fragments and any combination thereof.
  • Fluorophore-labeled genetic matter refers to genetic matter that has been labeled with a fluorescently-labeled probe (“fluorophore”).
  • Fluorophores include, but are not limited to, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif.).
  • This DNA may be prepared by reverse transcription of mRNA or by (PCR/IVT) or (IVT) with use of fluorophores as those skilled in the art will appreciate. See e.g. Gelder et al., 1990, “Amplified RNA synthesized from limited quantities of heterogenous cDNA, Proc. Natl. Acad. Sci., USA, 87:1663-1667).
  • PCR refers to the Polymerase Chain Reaction.
  • Biological System is broadly defined to include any cell, tissue, organ or multicellular organism.
  • a biological system can be a cell line, a cell culture, a tissue sample obtained from a subject, a Homo sapien , a mammal, a yeast substantially isogenic to Saccharomyces cerevisia , or any other art recognized biological system.
  • the state of a biological system can be measured by the content, activities or structures of its cellular constituents.
  • the state of a biological system is determined by the state of a collection of cellular constituents, which are sufficient to characterize the cell or organism for an intended purpose including characterizing the effects of a drug or other perturbation.
  • cellular constituent encompasses any kind of measurable biological variable.
  • the measurements and/or observations made on the state of these constituents can be of their abundances (i.e., amounts or concentrations in a biological system), their activities, their states of modification (e.g., phosphorylation), or other art recognized measurements relevant to the physiological state of a biological system.
  • this invention includes making such measurements and/or observations on different collections of cellular constituents. These different collections of cellular constituents are also called aspects of the biological state of a biological system.
  • One aspect of the biological state of a biological system is its transcriptional state.
  • the transcriptional state of a biological system includes the identities and abundances of the constituent RNA species, especially mRNAs, in the cell under a given set of conditions. Often, a substantial fraction of all constituent RNA species in the biological system are measured, but at least a sufficient fraction is measured to characterize the action of a drug or other perturbation of interest.
  • the transcriptional state of a biological system can be conveniently determined by measuring cDNA abundances by any of several existing gene expression technologies. DNA arrays for measuring mRNA or transcript level of a large number of genes can be employed to ascertain the biological state of a system.
  • the translational state of a biological system includes the identities and abundances of the constituent protein species in the biological system under a given set of conditions. Preferably a substantial fraction of all constituent protein species in the biological system is measured, but at least a sufficient fraction is measured to characterize the action of a drug of interest.
  • the transcriptional state is often representative of the translational state.
  • the activity state of a biological system includes the activities of the constituent protein species (and also optionally catalytically active nucleic acid species) in the biological system under a given set of conditions.
  • the translational state is often representative of the activity state.
  • This invention is also adaptable, where relevant, to “mixed” aspects of the biological state of a biological system in which measurements of different aspects of the biological state of a biological system are combined. For example, in one mixed aspect, the abundances of certain RNA species and of certain protein species, are combined with measurements of the activities of certain other protein species. Further, it will be appreciated from the following that this invention is also adaptable to any other aspect of a biological state of a biological system that is measurable.
  • the biological state of a biological system can be represented by a profile of some number of cellular constituents.
  • a profile of cellular constituents can be represented by the vector S.
  • S i is the level of the i'th cellular constituent, for example, the transcript level of gene i, or alternatively, the abundance or activity level of protein i.
  • Microarrays Determining the relative abundance of diverse individual sequences in complex DNA samples is often accomplished using microarrays. See e.g. Shalon et al., 1996, “A Microarray System for Analyzing Complex Samples Using Two-color Fluorescent Probe Hybridization, Genome Research 6:639-645). Frequently, transcript arrays are produced by hybridizing detectably labeled polynucleotides representing the mRNA transcripts present in a cell (e.g., fluorescently labeled cDNA synthesized from total cell mRNA) to a microarray.
  • a cell e.g., fluorescently labeled cDNA synthesized from total cell mRNA
  • a microarray is a surface with an ordered array of binding (e.g., hybridization) sites for products of many of the genes in the genome of a cell or organism, preferably most or almost all of the genes.
  • Microarrays are highly reproducible and therefore multiple copies of a given array can be produced and the nominal copies can be compared with each other.
  • microarrays are small, usually smaller than 5 cm 2 , and made from materials that are stable under binding (e.g., nucleic acid hybridization) conditions.
  • a given binding site or unique set of binding sites in the microarray will specifically bind the product of a single gene in the cell.
  • the level of hybridization to the site in the array corresponding to any particular gene will reflect the prevalence in the cell of mRNA transcribed from that gene.
  • detectably labeled e.g., with a fluorophore
  • the site on the array corresponding to a gene i.e., capable of specifically binding the product of the gene
  • the site on the array corresponding to a gene i.e., capable of specifically binding the product of the gene
  • a gene for which the encoded mRNA is prevalent will have a relatively strong signal.
  • Microarrays are advantageous because nucleic acids representing two different pools of nucleic acid can be hybridized to a microarray and the relative signal from each pool can simultaneously be measured.
  • Each of pool of nucleic acids may represent the state of a biological system before and after a perturbation.
  • a first nucleic acid pool may be derived from a mRNA pool from a cell culture before exposing the cell culture to a pharmacological agent and a second cDNA pool may be derived from a mRNA pool derived from the same culture after exposing the culture to a pharmacological agent.
  • the two pools of cDNA could represent pathway responses.
  • a first cDNA library could be derived from the mRNA of a first aliquot (“pool”) of a cell culture hat has been exposed to a pathway perturbation and a second cDNA library can be derived from the mRNA of a second aliquot (“pool”) of the same cell culture wherein the second aliquot was not exposed to the pathway perturbation.
  • microarray experiments including those described in this section, are referred to as (“differential microarray experiments”).
  • differential microarray experiments including those described in this section, are referred to as (“differential microarray experiments”).
  • differential microarray experiments One skilled in the art will appreciate that many forms of differential microarray experiments other than the ones outlined in this disclosure are within the scope of the definition of “differential microarray experiments”.
  • a differential intensity measurement refers to measurements made in differential microarray experiments.
  • a differential intensity measurement could be the difference between the brightness of a position on a microarray, which corresponds to a cellular constituent of interest, after (i) the microarray has been contacted with DNA derived from a biological system that represents a baseline state and (ii) the microarray has been contacted with DNA derived from a biological system that represents a perturbed state.
  • the baseline state of a biological system may represent the wild-type state of the biological system.
  • the baseline state of a biological system could represent a different perturbed state of the biological system.
  • Each microarray experiment in a differential microarray experiment, or repeated differential microarray experiment preferably utilizes the same or similar microarray.
  • Microarrays are considered similar if they are prepared from substantially isogenic biological systems and a majority of the binding spots on each microarray are common.
  • the microarray used in repeated microarray experiments may be the same identical microarray, wherein the microarray is washed between microarray experiments, or the microarray(s) used in repeated microarray experiments may be exact replicas of each other, or they may similar to each other.
  • each cDNA pool is distinctively labeled with a different dye if the two-fluorophore microarray format is chosen.
  • each cDNA pool is labeled by deriving fluorescently-labeled cDNA by reverse transcription of polyA + RNA in the presence of Cy3-(green) or Cy5-(red) deoxynucleotide triphosphates (Amersham).
  • the two cDNAs pools are mixed and hybridized to the microarray, the relative intensity of signal from each cDNA set is determined for each site on the array, and any relative difference in abundance of a particular mRNA detected.
  • the fluorescence emissions at each site of a microarray can be determined using scanning confocal laser microscopy.
  • a separate scan using the appropriate excitation line, is carried out for each of the two fluorophores used.
  • a laser may be used that allows simultaneous specimen illumination at wavelengths specific to the two fluorophores and emissions from the two fluorophores can be analyzed simultaneously (See e.g. Shalon et al., supra).
  • the microarrays may be scanned with a laser fluorescent scanner with a computer controlled X-Y stage and a microscope objective.
  • Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser and the emitted light is split by wavelength and detected with two photomultiplier tubes.
  • Fluorescence laser scanning devices are described in Schena et al., 1996, Genome Res. 6:639-645 and in references cited herein.
  • the fiber-optic bundle described by Ferguson et al., 1996, Nature Biotech. 14:1681-1684, may be used to monitor mRNA abundance levels at a large number of sites simultaneously.
  • Signals may be recorded and analyzed by computer, e.g., using a 12 bit analog to digital board.
  • the scanned image may be despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using an image gridding program that creates a spreadsheet of the average hybridization at each wavelength at each site. If necessary, an experimentally determined correction for “cross talk” (or overlap) between the channels for the two fluorophores may be made.
  • microarray experiments refers to the general class of experiments that are described in this section.
  • microarray experiments may include the use of a single fluorophore rather than the two-fluorophore example described infra.
  • microarray experiments may be paired. If paired, the first microarray experiment in the pair could represent a nominal biological system representing a baseline state. The second microarray experiment in the pair could represent the nominal biological system after it has been subjected to a perturbation. Thus comparison of the paired microarray experiment would reveal changes in the state of the nominal biological system based upon the perturbation.
  • these pairs of microarray experiments are referred to as “differential microarray experiments”.
  • Cell Expression Profiles An advantage of using two different cDNA pools in microarray experiments is that a direct and internally controlled comparison of the mRNA levels corresponding to each arrayed gene in two cell states can be made. This and related techniques for quantitative measurement of cellular constitutents is generally referred to as cell constituent profiling.
  • Cell constituent profiling is typically expressed as changes, either in absolute level or the ratio of levels, between two known cell conditions, such as a response to treatment of a baseline state with a pharmacological agent, as described in the previous section.
  • a ratio of the emission of the two fluorophores may be calculated for any particular hybridization site on a DNA transcript array. This ratio is independent of the absolute expression level of the cognate gene, but is useful for genes whose expression is significantly modulated by drug administration, gene deletion, or any other perturbation.
  • two-fluorophore cell expression profiles are typically plotted on an x-y graph.
  • the horizontal axis represents the log 10 of the ratio of the mean intensity (which approximately reflects the level of expression of a corresponding mRNA derived from a gene) between the first and second pool of cDNA for each site on the microarray.
  • the vertical axis represents the log 10 of the ratio of the intensity measured for one fluorescent label, corresponding to the first pool of cDNA, to that measured for the other fluorescent label, corresponding to the second pool of cDNA, for each hybridization site on the microarray.
  • subscripts 1 and 2 refer to two independently extracted mRNA cultures in which abundances are being compared
  • a 1 (k) is the abundance of species k in mRNA culture 1;
  • a 2 (k) is the abundance of species k in mRNA culture 2;
  • subscripts X and Y represent the two different fluorescent labels used.
  • r X/Y is the color ratio that ideally reflects abundance ratio a 1 /a 2 .
  • Equation (1) ideally represents the measurement plotted on the vertical axis of FIGS. 2 thru 6 .
  • the use of a fluorophore labeled deoxynucleotide triphosphates affects the efficiency by which mRNA is reverse transcribed into cDNA and affects the efficiency to which the flourophore-labeled cDNA hybridizes to the microarray.
  • the precise amount a specific fluorophore affects the transcription or hybridization efficiency is highly dependent upon the precise molecular structure of the fluorophore used.
  • r X/Y is color ratio
  • subscripts X and Y are two fluorescent labels
  • E X (k) is the efficiency of flourescent label X.
  • E Y (k) is the efficiency of flourescent label Y.
  • r X/Y (rev) is color ratio in the reverse experiment.
  • Equation (4) can be written equivalently using ratios as found in equations (1)-(3) instead of differences of log ratios.
  • changes in constituent levels are most appropriately expressed as the logarithm of the ratio of abundance in the pair of conditions forming the differential measurement. This is because fold changes are more meaningful than changes in absolute level, biologically.
  • FIG. 4 illustrates the bias removal method of the present invention.
  • FIG. 4 a is a color ratio vs. intensity plot for a two-color hybridization experiment in which the two cultures used are nominally the same background strain of the yeast S. Cerevisiae . Because the two cultures are nominally the same, it is expected that individual spots on the microarray would flouresce with the same amount of intensity for both of the fluorophores used. Experimental methods are described in the experimental section infra. However, as is readily apparent from FIG. 4 a , some of the spots on the microarray exhibit fluorophore-specific intensity. For example, spots on the microarray, corresponding to various genes in the yeast S.
  • FIG. 4 b shows the result of the fluorophore-reversed version of the experiment plotted in FIG. 4 a .
  • the flagged genes in FIG. 4 b now have opposite bias.
  • FIG. 4 c shows the result of combining the data of FIGS. 4 a and 4 b according to the methods of the present invention described above. The biases of the flagged genes have been greatly reduced.
  • This rank combining has the advantage that it does not require any modeling of the detailed error behavior in the underlying hybridization experiments, other than the assumption of no systematic biases.
  • the rank based method is an example of a non-parametric statistical test for the significance of observed up- or down-regulations.
  • Percentile rankings such as equations (5) and (6) are based upon the assumption that the underlying error behavior is similar for all genes. This is not necessarily the case.
  • FIG. 5 which plots the expression ratio of two nominative repeats of the same experiment
  • the weakly expressing genes as expressed by log 10 (intensity)
  • the weaker expressing a particular gene is, the higher the tendency of the log 10 (expression ratio) of the gene from two nominal repeats of an experiment to deviate from zero.
  • the low-abundance (weakly expressing and hence low-intensity hybridization) genes will tend to occupy the tails of the distribution of expression ratios (i.e. deviate from zero in accordance with FIG. 5) more often than the higher-abundance genes.
  • X and Y are the brightness for a probe spot on the microarray with respect to the X and Y fluorophores
  • ⁇ X 2 is a variance term for X and represents the additive error level in the X channel
  • ⁇ Y 2 is a variance term for Y and represents the additive error level in the Y channel
  • f is the fractional multiplicative error level
  • the first fluorophore (X) may optionally represent a biological system in a base line state whereas the second fluorophore (Y) may represent the biological system in a perturbed state.
  • the fractional multiplicative error, f is empirically derived by fitting the denominator of equation 7 to the measured data.
  • the denominator of Equation (7) is the expected standard error of the numerator, so d has unit variance. d is therefore an error distribution statistic that is independent of intensity, and therefore applicable to rank methods. Any other definition with the non-parametric properties of equation (7) is also a good variable to use in the rank methods.
  • the denominator of equation (7) is used to generate the intensity independent contour lines shown in FIG. 5.
  • the choice of using grid lines of ⁇ 1 standard deviation according to the denominator of equation (7) is completely arbitrary.
  • the contour lines could be gridded at any convenient value such as 0.25 ⁇ , 0.5 ⁇ , 20 ⁇ as long as the contour lines are plotted in accordance with the denominator of equation (7) or a similar nonparamatric representation of error.
  • contour lines follow the error envelope.
  • the errors are distributed with respect to the contours similarly at low and at high intensity, and d has the desired property.
  • One advantage of plotting contour lines is that the amount of error associated with each cellular constituent measured on the microarray can be calculated based on information derived from the variance of all the cellular constituents on the microarray across a plurality of measurements. Thus, by using grid lines as plotted in FIG.
  • ⁇ X-Y is the standard error (rms uncertainty) associated with that gene. This uncertainty may be derived from repeated control experiments where X and Y are derived from the same biological system, in which case ⁇ X-Y is the observed standard deviation of X-Y for that gene over the set of experiments. This definition of d then is similarly distributed for all genes, and (5) and (6) may be used with ranking d.
  • Each ⁇ i 2 in equation (8) may be determined in a variety of ways.
  • One approach is to calculate the error envelope for a microarray experiment using two nominal repeats of the two-fluorophore microarray experiment in which the only difference between the two experiments is that the two fluorophores utilized are reversed. See e.g. FIG. 4.
  • only one fluorophore could be utilized. Therefore, there could be no difference at all in the two nominal repeats that are paired in order to determine an error envelope.
  • FIG. 5 also illustrates intensity independent contour lines that are fitted in accordance with the denominator of equation (7).
  • the intensity (x i ) is plotted on the appropriate reference plot, such as FIG. 5. For example, in FIG. 5, the intensity of individual measurements would be plotted along the horizontal axis. Once the horizontal position is determined, ⁇ i 2 is calculated based upon the width of the ⁇ 1 ⁇ intensity independent contour lines at position x i on the reference plot.
  • Equation (10) transitions from Equation (9) to the value of the observed scatter, s j , as the number of repeats, N, becomes large.
  • s j is calculated according to traditional statistical methods, such that s j ⁇ 1 N - 1 ⁇ ⁇ i ⁇ ( x i - x _ ) 2 ( 11 )
  • An estimate of the error of the mean, x, as described by equation (10) is necessary because, equations such as (11) require a large number of nominal repeats (N) in order to be a true reflection of error.
  • Estimates of error based on equation (9) do not take into consideration the errors that particular measurement are susceptible to as illustrated in FIG. 1 and as well as gene specific anomalies.
  • equations that accomplish the transition from equation (9) to equation (10) are possible.
  • FIG. 6 illustrates the reduction in error obtained with repeated experiments, and the consequent gain in information.
  • FIG. 6 a is the signature plot for a single experiment with the drug CsA, obtained as described in the experimental section infra.
  • the responses of a biological system to a perturbation can be measured by observing the changes in the biological state of the biological system.
  • a response profile is a collection of changes of cellular constituents.
  • the response profile of a biological system (e.g., a cell or cell culture) to the perturbation m may be defined as the vector v (m) :
  • v i m is the amplitude of response of cellular constituent i under the perturbation m.
  • biological response to the application of a pharmacological agent is measured by the induced change in the transcript level of at least 2 genes, preferably more than 10 genes, more preferably more than 100 genes and most preferably more than 1,000 genes.
  • biological response profiles comprise simply the difference between biological variables before and after perturbation.
  • the biological response is defined as the ratio of cellular constituents before and after a perturbation is applied.
  • v i m is set to zero if the response of gene i is below some threshold amplitude or confidence level determined from knowledge of the measurement error behavior. In such embodiments, those cellular constituents whose measured responses are lower than the threshold are given the response value of zero, whereas those cellular constituents whose measured responses are greater than the threshold retain their measured response values.
  • This truncation of the response vector is suitable when most of the smaller responses are expected to be greatly dominated by measurement error. After the truncation, the response vector v (m) also approximates a ‘matched detector’ (see, e.g., Van Trees, 1968 , Detection, Estimation, and Modulation Theory Vol.
  • genes whose transcript level changes are lower than two fold or more preferably four fold are given the value of zero.
  • perturbations are applied at several levels of strength. For example, different amounts of a drug may be applied to a biological system to observe its response.
  • the perturbation responses may be interpolated by approximating each by a single parameterized “model” function of the perturbation strength u.
  • the adjustable parameters are selected independently for each cellular constituent of the perturbation response.
  • the adjustable parameters are selected for each cellular constituent so that the sum of the squares of the differences between the model function (e.g., the Hill function, Equation 13) and the corresponding experimental data at each perturbation strength is minimized.
  • This preferable parameter adjustment method is known in the art as a least squares fit.
  • Other possible model functions are based on polynomial fitting. More detailed description of model fitting and biological response has been disclosed in Friend and Stoughton, Methods of Determining Protein Activity Levels Using Gene Expression Profiles, U.S. Provisional Application Serial No. 60/084,742, filed on May 8, 1998, which is incorporated herein by reference in it's entirety for all purposes.
  • the methods of the invention are useful for comparing augmented profiles that contain any number of response profile and/or projected profiles. Projected profiles are best understood after a discussion of genesets, which are co-regulated genes. Projected profiles are useful for analyzing many types of cellular constituents including genesets.
  • Genes tend to increase or decrease their rates of transcription together when they possess similar regulatory sequence patterns, i.e., transcription factor binding sites. This is the mechanism for coordinated response to particular signaling inputs (see, e.g., Madhani and Fink, 1998, The riddle of MAP kinase signaling specificity, Transactions in Genetics 14:151-155; Arnone and Davidson, 1997, The hardwiring of development: organization and function of genomic regulatory systems, Development 124:1851-1864). Separate genes which make different components of a necessary protein or cellular structure will tend to co-vary. Duplicated genes (see, e.g., Wagner, 1996, Genetic redundancy caused by gene duplications and its evolution in networks of transcriptional regulators, Biol. Cybern.
  • a preferred embodiment for identifying such basis genesets involves clustering algorithms (for reviews of clustering algorithms, see, e.g., Fukunaga, 1990 , Statistical Pattern Recognition 2nd Ed., Academic Press, San Diego; Everitt, 1974 , Cluster Analysis , London: Heinemann Educ. Books; Hartigan, 1975 , Clustering Algorithms , New York: Wiley; Sneath and Sokal, 1973 , Numerical Taxonomy , Freeman; Anderberg, 1973 , Cluster Analysis for Applications , Academic Press: New York).
  • clustering algorithms for reviews of clustering algorithms, see, e.g., Fukunaga, 1990 , Statistical Pattern Recognition 2nd Ed., Academic Press, San Diego; Everitt, 1974 , Cluster Analysis , London: Heinemann Educ. Books; Hartigan, 1975 , Clustering Algorithms , New York: Wiley; Sneath and Sokal, 1973 , Numerical Taxonomy , Freeman; Anderberg, 1973
  • cluster analysis the expression of a large number of genes is monitored as biological systems are subjected to a wide variety of perturbations.
  • a table of data containing the gene expression measurements is used for cluster analysis.
  • Cluster analysis operates on a table of data which has the dimension m ⁇ k wherein m is the total number of conditions or perturbations and k is the number of genes measured.
  • I(x, y) is the distance between gene X and gene Y; X i and Y i are gene expression response under perturbation i.
  • the Euclidean distance may be squared to place progressively greater weight on objects that are further apart.
  • X i and Y i are gene expression responses under perturbation i.
  • Various cluster linkage rules are useful for defining genesets.
  • Single linkage a nearest neighbor method, determines the distance between the two closest objects.
  • complete linkage methods determine distance by the greatest distance between any two objects in the different clusters. This method is particularly useful in cases when genes or other cellular constituents form naturally distinct “clumps.”
  • the unweighted pair-group average defines distance as the average distance between all pairs of objects in two different clusters. This method is also very useful for clustering genes or other cellular constituents to form naturally distinct “clumps.”
  • the weighted pair-group average method may also be used. This method is the same as the unweighted pair-group average method except that the size of the respective clusters is used as a weight.
  • This method is particularly useful for embodiments where the cluster size is suspected to be greatly varied (Sneath and Sokal, 1973 , Numerical taxonomy , San Francisco: W. H. Freeman & Co.).
  • Other cluster linkage rules such as the unweighted and weighted pair-group centroid and Ward's method are also useful for some embodiments of the invention. See., e.g., Ward, 1963 , J. Am. Stat Assn. 58:236; Hartigan, 1975 , Clustering algorithms , New York: Wiley.
  • Genesets may be defined based on the many smaller branches of a tree, or a small number of larger branches by cutting across the tree at different levels. The choice of cut level may be made to match the number of distinct response pathways expected. If little or no prior information is available about the number of pathways, then the tree should be divided into as many branches as are truly distinct. ‘Truly distinct’ may be defined by a minimum distance value between the individual branches. Typical values are in the range 0.2 to 0.4 where 0 is perfect correlation and 1 is zero correlation, but may be larger for poorer quality data or fewer experiments in the training set, or smaller in the case of better data and more experiments in the training set.
  • ‘truly distinct’ may be defined with an objective test of statistical significance for each bifurcation in the tree.
  • the Monte Carlo randomization of the experiment index for each cellular constituent's responses across the set of experiments is used to define an objective test.
  • the objective test is defined in the following manner:
  • D k is the square of the distance measure for constituent k with respect to the center (mean) of its assigned cluster.
  • Superscript 1 or 2 indicates whether it is with respect to the center of the entire branch or with respect to the center of the appropriate Cluster out of the two subclusters.
  • the distribution of fractional improvements obtained from the Monte Carlo procedure is an estimate of the distribution under the null hypothesis that a given branching was not significant.
  • the actual fractional improvement for that branching with the unpermuted data is then compared to the cumulative probability distribution from the null hypothesis to assign significance.
  • Standard deviations are derived by fitting a log normal model for the null hypothesis distribution. Using this procedure, a standard deviation greater than about 2, for example, indicates that the branching is significant at the 95% confidence level.
  • Genesets defined by cluster analysis typically have underlying biological significance.
  • Another aspect of the cluster analysis method provides the definition of basis vectors for use in profile projection described in the following sections.
  • a set of basis vectors V has k ⁇ n dimensions, where k is the number of genes and n is the number of genesets.
  • V [ V 1 ( 1 ) . V 1 ( n ) . . . V k ( 1 ) . V k ( n ) ] ( 17 )
  • the elements V (n) k are normalized so that each V (n) k has unit length by dividing by the square root of the number of genes in geneset n. This produces basis vectors which are not only orthogonal (the genesets derived from cutting the clustering tree are disjoint), but also orthonormal (unit length). With this choice of normalization, random measurement errors in profiles project onto the V (n) k in such a way that the amplitudes tend to be comparable for each n. Normalization prevents large genesets from dominating the results of similarity calculations.
  • Genesets can also be defined based upon the mechanism of the regulation of genes. Genes whose regulatory regions have the same transcription factor binding sites are more likely to be co-regulated. In some preferred embodiments, the regulatory regions of the genes of interest are compared using multiple alignment analysis to decipher possible shared transcription factor binding sites (Stormo and Hartzell, 1989, Identifying protein binding sites from unaligned DNA fragments, Proc Natl Acad Sci 86:1183-1187; Hertz and Stormo, 1995, Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps, Proc of 3 rd Intl Conf on Bioinformatics and Genome Research , Lim and Cantor, eds., World Scientific Publishing Co., Ltd. Singapore, pp. 201-216). For example, as Example 3, infra, shows, common promoter sequence responsive to Gcn4 in 20 genes may be responsible for those 20 genes being co-regulated over a wide variety of perturbations.
  • Co-regulated genes are not limited to those with binding sites for the same transcriptional factor.
  • Co-regulated (co-varying) genes may be in the up-stream/down-stream relationship where the products of up-stream genes regulate the activity of down-stream genes. It is well known to those of skill in the art that there are numerous varieties of gene regulation networks. One of skill in the art also understands that the methods of this invention are not limited to any particular kind of gene regulation mechanism. If it can be derived from the mechanism of regulation that two genes are co-regulated in terms of their activity change in response to perturbation, the two genes may be clustered into a geneset.
  • K-means clustering may be used to cluster genesets when the regulation of genes of interest is partially known. K-means clustering is particularly useful in cases where the number of genesets is predetermined by the understanding of the regulatory mechanism. In general, K-mean clustering is constrained to produce exactly the number of clusters desired. Therefore, if promoter sequence comparison indicates the measured genes should fall into three genesets, K-means clustering may be used to generate exactly three genesets with greatest possible distinction between clusters.
  • the expression value of genes can be converted into the expression value for genesets. This process is referred to as projection.
  • the projection is as follows:
  • V [0149] wherein, p is the expression profile, P is the projected profile, P i is expression value for geneset i and V is a predefined set of basis vectors.
  • V (n) k is the amplitude of cellular constituent index k of basis vector n.
  • the value of geneset expression is simply the average of the expression value of the genes within the geneset. In some other embodiments, the average is weighted so that highly expressed genes do not dominate the geneset value. The collection of the expression values of the genesets is the projected profile.
  • This definition is the generalized angle cosine between the vectors P i and P j . It is the projected version of the conventional correlation coefficient between p i andp j . Profile p i is deemed most similar to that other profile p j for which S ij is maximum. New profiles may be classified according to their similarity to profiles of known biological significance, such as the response patterns for known drugs or perturbations in specific biological pathways. Sets of new profiles may be clustered using the distance metric
  • the statistical significance of any observed similarity S ij may be assessed using an empirical probability distribution generated under the null hypothesis of no correlation. This distribution is generated by performing the projection, Equations (19) and (20) for many different random permutations of the constituent index in the original profile p. That is, the ordered Set p k are replaced by p II(k) where II(k) is a permutation, for ⁇ 100 to 1000 different random permutations. The probability of the similarity S ij arising by chance is then the fraction of these permutations for which the similarity S ij (permuted) exceeds the similarity observed using the original unpermuted data.
  • This section provides some exemplary methods for measuring biological responses as well as the procedures necessary to make the reagents used in such methods.
  • Micro arrays are known in the art and consist of a surface to which probes that correspond in sequence to gene products (e.g., cDNAs, mRNAs, cRNAs, polypeptides, and fragments thereof), can be specifically hybridized or bound at a known position.
  • the microarray is an array (i.e., a matrix) in which each position represents a discrete binding site for a product encoded by a gene (e.g., a protein or RNA), and in which binding sites are present for products of most or almost all of the genes in the organism's genome.
  • the “binding site” is a nucleic acid or nucleic acid analogue to which a particular cognate cDNA can specifically hybridize.
  • the nucleic acid or analogue of the binding site can be, e.g., a synthetic oligomer, a full-length cDNA, a less-than full length cDNA, or a gene fragment.
  • the micro array contains binding sites for products of all or almost all genes in the target organism's genome, such comprehensiveness is not necessarily required.
  • the microarray will have binding sites corresponding to at least about 50% of the genes in the genome, often at least about 75%, more often at least about 85%, even more often more than about 90%, and most often at least about 99%.
  • the micro array has binding sites for genes relevant to the action of a drug of interest or in a biological pathway of interest.
  • a “gene” is an open reading frame (ORF) of preferably at least 50, 75, or 99 amino acids from which a messenger RNA is transcribed in the organism (e.g.; if a single cell) or in some cell in a multicellular organism.
  • the number of genes in a genome can be estimated from the number of mRNAs expressed by the. organism, or by extrapolation from a well-characterized portion of the genome.
  • the number of ORFs can be determined and mRNA coding regions identified by analysis of the DNA sequence.
  • the Saccharomyces cerevisiae genome has been completely sequenced and is reported to have approximately 6275 open reading frames (ORFs) longer than 99 amino acids. Analysis of these ORFs indicates that there are 5885 ORFs that are likely to specify protein products (Goffeau et al., 1996, Life with 6000 genes, Science 274:546-567, which is incorporated by reference in its entirety for all purposes).
  • the human genome is estimated to contain approximately 10 5 genes.
  • the “binding site” to which a particular cognate cDNA specifically hybridizes is usually a nucleic acid or nucleic acid analogue attached at that binding site.
  • the binding sites of the microarray are DNA polynucleotides corresponding to at least a portion of each gene in an organism's genome. These DNAs can be obtained by, e.g., polymerase chain reaction (PCR) amplification of gene segments from genomic DNA, cDNA (e g., by RT-PCR), or cloned sequences.
  • PCR polymerase chain reaction
  • PCR primers are chosen, based on the known sequence of the genes or cDNA, that result in amplification of unique fragments (i.e., fragments that do not share more than 10 bases of contiguous identical sequence with any other fragment on the microarray).
  • Computer programs are useful in the design of primers with the required specificity and optimal amplification properties. In the case of binding sites corresponding to very long genes, it will sometimes be desirable to amplify segments near the 3′ end of the gene so that when oligo-dT primed cDNA probes are hybridized to the microarray, less-than-full length probes will bind efficiently.
  • each gene fragment on the microarray will be between about 50 bp and about 2000 bp, more typically between about 100 bp and about 1000 bp, and usually between about 300 bp and about 800 bp in length.
  • PCR methods are well known and are described, for example, in Innis et al. eds., 1990, PCR Protocols: A Guide to Methods and Applications, Academic Press Inc., San Diego, Calif., which is incorporated by reference in its entirety for all purposes.
  • nucleic acid for the microarray is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N-phosphonate or phosphoramidite chemistries (Froehler et al., 1986, Nucleic Acid Res 14:5399-5407; McBride et al., 1983, Tetrahedron Lett. 24:245-248). Synthetic sequences are between about 15 and about 500 bases in length, more typically between about 20 and about 50 bases.
  • synthetic nucleic acids include non-natural bases, e.g., inosine.
  • nucleic acid analogues may be used as binding sites for hybridization.
  • nucleic acid analogue is peptide nucleic acid (see, e.g., Egholm et al., 1993, PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick hydrogen-bonding rules, Nature 365:566-568; see also U.S. Pat. No. 5,539,083).
  • the binding (hybridization) sites are made from plasmid or phage clones of genes, cDNAs (e.g., expressed sequence tags), or inserts therefrom (Nguyen et al., 1995, Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones, Genomics 29:207-209).
  • the polynucleotide of the binding sites is RNA.
  • the nucleic acid or analogue are attached to a solid support, which may be made from glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, or other materials.
  • a preferred method for attaching the nucleic acids to a surface is by printing on glass plates, as is described generally by Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary microarray, Science 270:467-470. This method is especially useful for preparing microarrays of cDNA.
  • a second preferred method for making microarrays is by making high-density oligonucleotide arrays.
  • Techniques are known for producing arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface using photolithographic techniques for synthesis in situ (see, Fodor et al., 1991, Light-directed spatially addressable parallel chemical synthesis, Science 251:767-773; Pease et al., 1994, Light-directed oligonucleotide arrays for rapid DNA sequence analysis, Proc. Natl. Acad. Sci.
  • oligonucleotides e.g., 20-mers
  • the array produced contains multiple probes against each target transcript.
  • Oligonucleotide probes can be chosen to detect alternatively spliced mRNAs or to serve as various type of control.
  • microarrays Another preferred method of making microarrays is by use of an inkjet printing process to synthesize oligonucleotides directly on a solid phase, as described, e.g., in co-pending U.S. patent application Ser. No. 09/008,120 filed on Jan. 16, 1998, by Blanchard entitled “Chemical Synthesis Using Solvent Microdroplets”, which is incorporated by reference herein in its entirety.
  • microarrays e.g., by masking
  • any type of array for example, dot blots on a nylon hybridization membrane (see Sambrook et al., Molecular Cloning—A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989), could be used, although, as will be recognized by those of skill in the art, very small arrays will be preferred because hybridization volumes will be smaller.
  • RNA is extracted from cells of the various types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl centrifugation (Chirgwin et al., 1979, Biochemistry 18:5294-52995.
  • Poly(A)+ RNA is selected by selection with oligo-dT cellulose (see Sambrook et al., supra).
  • Cells of interest include wild-type cells, drug-exposed wild-type cells, modified cells, and drug-exposed modified cells.
  • Labeled cDNA is prepared from mRNA by oligo dT-primed or random-primed reverse transcription, both of which are well known in the art (see, e.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). Reverse transcription may be carried out in the presence of a dNTP conjugated to a detectable label, most preferably a fluorescently labeled dNTP.
  • isolated mRNA can be converted to labeled antisense RNA synthesized by in vitro transcription of double-stranded cDNA in the presence of labeled dNTPs (Lockhart et al., 1996, Expression monitoring by hybridization to high-density oligonucleotide arrays, Nature Biotech. 14:1675, which is incorporated by reference in its entirety for all purposes).
  • the cDNA or RNA probe can be synthesized in the absence of detectable label and may be labeled subsequently, e.g., by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo-cross-linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent.
  • labeled streptavidin e.g., phycoerythrin-conjugated streptavidin
  • fluorophores include fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others (see, e.g., Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press San Diego, Calif.). It will be appreciated that pairs of fluorophores are chosen that have distinct emission spectra so that they can be easily distinguished.
  • a label other than a fluorescent label is used.
  • a radioactive label or a pair of radioactive labels with distinct emission spectra, can be used (see Zhao et al., 1995, High density cDNA filter analysis: a novel approach for large-scale, quantitative analysis of gene expression, Gene 156:207; Pietu et al., 1996, Novel gene transcripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array, Genome Res. 6:492).
  • use of radioisotopes is a less-preferred embodiment.
  • labeled cDNA is synthesized by incubating a mixture containing 0.5 mM dGTP, DATP and dCTP plus 0.1 mM dTTP plus fluorescent deoxyribonucleotides (eg., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)) with reverse transcriptase (e.g., SuperScriptTM II, LTI Inc.) at 42° C. for 60 minutes.
  • fluorescent deoxyribonucleotides eg., 0.1 mM Rhodamine 110 UTP (Perken Elmer Cetus) or 0.1 mM Cy3 dUTP (Amersham)
  • reverse transcriptase e.g., SuperScriptTM II, LTI Inc.
  • nucleic acid hybridization and wash conditions are optimally chosen so that the probe “specifically binds” or “specifically hybridizes” to a specific array site, i.e., the probe hybridizes, duplexes or binds to a sequence array site with a complementary nucleic acid sequence but does not hybridize to a site with a non-complementary nucleic acid sequence.
  • One polynucleotide sequence is considered complementary to another when, if the shorter of the polynucleotides is less than or equal to 25 bases, there are no mismatches using standard base-pairing rules or, if the shorter of the polynucleotides is longer than 25 bases, there is no more than a 5% mismatch.
  • the polynucleotides are perfectly complementary (no mismatches). It can easily be demonstrated that specific hybridization conditions result in specific hybridization by carrying out a hybridization assay including negative controls (see, e.g., Shalon et al., supra, and Chee et al., supra).
  • Optimal hybridization conditions will depend on the length (e.g., oligomer versus polynucleotide greater than 200 bases) and type (e.g., RNA, DNA, PNA) of labeled probe and immobilized polynucleotide or oligonucleotide.
  • length e.g., oligomer versus polynucleotide greater than 200 bases
  • type e.g., RNA, DNA, PNA
  • General parameters for specific (i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et al., supra, and in Ausubel et al., 1987, Current Protocols in Molecular Biology, Greene Publishing and Wiley-Interscience, New York.
  • typical hybridization conditions are hybridization in 5 ⁇ SSC plus 0.2% SDS at 65° C.
  • FIG. 7 illustrates an exemplary computer system suitable for implementation of the analytic methods of this invention.
  • Computer system 501 is illustrated as comprising internal components and being linked to external components.
  • the internal components of this computer system include processor element 502 interconnected with main memory 503 .
  • processor element 502 interconnected with main memory 503 .
  • computer system 501 can be an Intel 8086-, 80386-, 80486-, Pentium®, or Pentium®-based processor with preferably 32 MB or more of main memory.
  • the external components include mass storage 504 .
  • This mass storage can be one or more hard disks (which are typically packaged together with the processor and memory). Such hard disks are preferably of 1 GB or greater storage capacity.
  • Other external components include user interface device 505 , which can be a monitor, together with inputing device 506 , which can be a “mouse”, or other graphic input devices (not illustrated), and/or a keyboard.
  • a printing device 508 can also be attached to the computer 501 .
  • computer system 501 is also linked to network link 507 , which can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet.
  • network link 507 can be part of an Ethernet link to other local computer systems, remote computer systems, or wide area communication networks, such as the Internet. This network link allows computer system 501 to share data and processing tasks with other computer systems.
  • Software component 510 represents the operating system, which is responsible for managing computer system 501 and its network interconnections. This operating system can be, for example, of the Microsoft Windows' family, such as Windows 3.1, Windows 95, Windows 98, or Windows NT.
  • Software component 511 represents common languages and functions conveniently present on this system to assist programs implementing the methods specific to this invention. Many high or low level computer languages can be used to program the analytic methods of this invention. Instructions can be interpreted during run-time or compiled.
  • Preferred languages include C/C++, FORTRAN and JAVA®.
  • the methods of this invention are programmed in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including algorithms to be used, thereby freeing a user of the need to procedurally program individual equations or algorithms.
  • Such packages include Matlab from Mathworks (Natick, Mass.), Mathematica from Wolfram Research (Champaign, Ill.), or S-Plus from Math Soft (Cambridge, Mass.).
  • software component 512 and/or 513 represents the analytic methods of this invention as programmed in a procedural language or symbolic package.
  • a user first loads differential microarray experiment data into the computer system 501 .
  • a user first loads microarray experiment data into the computer system. This data is loaded into the memory from the storage media ( 504 ) or from a remote computer, preferably from a dynamic geneset database system, through the network ( 507 ). Next the user causes execution of software that performs the steps of fluorophore bias removal, the rank-based methods of the present invention or the weighted averaging protocols of the present invention.
  • FK506 was added to a final concentration of 1 ⁇ g/ml 0.5 hr after inoculation of the culture.
  • Cyclosporin A (CsA) was added to a concentration of 30 ⁇ g/ml.
  • Cells were broken by standard procedures (See e.g. Ausubel et al., Current Protocols in Molecular Biology, John Wiley & Sons, Inc. (New York), 12.12.1-13.12.5) with the following modifications.
  • Fluorescently-labeled cDNA was prepared, purified and hybridized essentially as described by DeRisi et al. DeRisi et al., 1997, Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278:680-686. Briefly, Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA during reverse transcription (Superscript II, LTI, Inc.) And purified by concentrating to less than 10 ⁇ l using Microcon-30 microconcentrators (Amicon).
  • Paired cDNAs were resuspended in 20-26 ⁇ l hybridization solution (3 ⁇ SSC, 0.75 ⁇ g/ml poly A DNA, 0.2% SDS) and applied to the microarray under a 22 ⁇ 30 mm coverslip for 6 hr at 63° C., all according to DeRisi et al., (1997), supra.
  • PCR products containing common 5′ and 3′ sequences were used as templates with amino-modified forward primer and unmodified reverse primers to PCR amplify 6065 ORFs from the S. cervisiae genome.
  • First pass success rate was 94%.
  • Amplification reactions that gave products of unexpected sizes were excluded from subsequent analysis.
  • ORFs that could not be amplified from purchased templates were amplified from genomic DNA.
  • DNA samples from 100 ⁇ l reactions were isopropanol precipitated, resuspended in water, brought to 3 ⁇ SSC in a total volume of 15 ⁇ l, and transferred to 384-well microtiter plates (Genetix).
  • PCR products were spotted into 1 ⁇ 3 inch polylysine-treated glass slides by a robot built according to specifications provided in Schena et al., supra; DeRisi et al., 1996, Discovery and analysis of inflammatory disease-related genes using microarrays, PNAS USA, 94:2150-2155; and DeResi et al., (1997). After printing, slides were processed following published protocols. See DeResi et al., (1997).
  • Microarrays were images on a prototype multi-frame CCD camera in development at Applied Precision, Inc. (Seattle, Wash.). Each CCD image frame was approximately 2 mm square. Exposure time of 2 sec in the Cy5 channel (white light through Chroma 618-648 nm excitation filter, Chroma 657-727 nm emission filter) and 1 sec in the Cy3 channel (Chroma 535-560 nm excitation filter, Chroma 570-620 nm emission filter) were done consecutively in each fram before moving to the next, spatially contiguous frame. Color isolation between the Cy3 and Cy5 channels was ⁇ 100:1 or better. Frames were knitted together in software to make the complete images.
  • the intensity of spots ( ⁇ 100 ⁇ m) were quantified from the 10 ⁇ m pixels by frame background subtraction and intensity averaging in each channel. Dynamic range of the resulting spot intensities was typically a ration of 1000 between the brightest spots and the background-subracted additive error level. Normalization between the channels was accomplished by normalizing each channel to the mean intensities of all genes. This procedure is nearly equivalent to normalization between channels using the intensity ration of genomic DNA spots (See DeRisi et al., 1997), but is possibly more robust since it is based on the intensities of several thousand spots distributed over the array.
  • x k is the log 10 of the expression ratio for the k'th gene in the x signature
  • y k is the log 10 of the expression ratio for the k'th gene in the y signature.
  • the summation is over those genes that were either up- or down-regulated in either experiment at the 95% confidence level. These genes each had a less than 5% chance of being actually unregulated (having expression ratios departing from unity due to measurement errors alone).
  • This confidence level was assigned based on an error model which assigns a lognormal probability distribution to each gene's expression ratio with characteristic width based on the observed scatter in its repeated measurements (repeated arrays at the same nominal experimental conditions) and on the individual array hybridization quality. This latter dependence was derived from control experiments in which both Cy3 and Cy5 samples were derived from the same RNA sample. For large numbers of repeated measurements the error reduces to the observed scatter. For a single measurement the error is based on the array quality and the spot intensity.
  • Expression ratios are based on mean intensities over each spot.
  • the occasional smaller spots have fewer image pixels in the average. This does not degrade accuracy noticeably until the number of pixels falls below ten, in which case the spot is rejected from the data set.
  • Wander of spot positions with respect to the nominal grid is adaptively tracked in array subregions by the image processing software.
  • Unequal spot wander within a subregion greater than half a spot spacing is problematic for the automated quantitating algorithms; in this case the spot is rejected from analysis based on human inspection of the wander. Any spots partially overlapping are excluded from the data set. Less than 1% of spots typically are rejected for these reasons.
US10/058,696 1998-12-28 2002-01-28 Statistical combining of cell expression profiles Abandoned US20020128781A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US10/058,696 US20020128781A1 (en) 1998-12-28 2002-01-28 Statistical combining of cell expression profiles
US10/287,130 US7966130B2 (en) 1998-12-28 2002-11-04 Systems and methods for determining a weighted mean intensity
US11/042,653 US20050130215A1 (en) 1998-12-28 2005-01-24 Systems and methods for correcting error in biological response signal data
US11/042,654 US8521441B2 (en) 1998-12-28 2005-01-24 Method and computer program product for reducing fluorophore-specific bias
US11/303,121 US7565251B2 (en) 1998-12-28 2005-12-12 Systems and methods for evaluating the significance of differences in biological measurements

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/222,596 US6351712B1 (en) 1998-12-28 1998-12-28 Statistical combining of cell expression profiles
US10/058,696 US20020128781A1 (en) 1998-12-28 2002-01-28 Statistical combining of cell expression profiles

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/222,596 Division US6351712B1 (en) 1998-12-28 1998-12-28 Statistical combining of cell expression profiles

Related Child Applications (4)

Application Number Title Priority Date Filing Date
US10/287,130 Division US7966130B2 (en) 1998-12-28 2002-11-04 Systems and methods for determining a weighted mean intensity
US11/042,654 Continuation US8521441B2 (en) 1998-12-28 2005-01-24 Method and computer program product for reducing fluorophore-specific bias
US11/042,653 Continuation US20050130215A1 (en) 1998-12-28 2005-01-24 Systems and methods for correcting error in biological response signal data
US11/303,121 Continuation US7565251B2 (en) 1998-12-28 2005-12-12 Systems and methods for evaluating the significance of differences in biological measurements

Publications (1)

Publication Number Publication Date
US20020128781A1 true US20020128781A1 (en) 2002-09-12

Family

ID=22832874

Family Applications (6)

Application Number Title Priority Date Filing Date
US09/222,596 Expired - Fee Related US6351712B1 (en) 1998-12-28 1998-12-28 Statistical combining of cell expression profiles
US10/058,696 Abandoned US20020128781A1 (en) 1998-12-28 2002-01-28 Statistical combining of cell expression profiles
US10/287,130 Expired - Fee Related US7966130B2 (en) 1998-12-28 2002-11-04 Systems and methods for determining a weighted mean intensity
US11/042,654 Expired - Fee Related US8521441B2 (en) 1998-12-28 2005-01-24 Method and computer program product for reducing fluorophore-specific bias
US11/042,653 Abandoned US20050130215A1 (en) 1998-12-28 2005-01-24 Systems and methods for correcting error in biological response signal data
US11/303,121 Expired - Fee Related US7565251B2 (en) 1998-12-28 2005-12-12 Systems and methods for evaluating the significance of differences in biological measurements

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/222,596 Expired - Fee Related US6351712B1 (en) 1998-12-28 1998-12-28 Statistical combining of cell expression profiles

Family Applications After (4)

Application Number Title Priority Date Filing Date
US10/287,130 Expired - Fee Related US7966130B2 (en) 1998-12-28 2002-11-04 Systems and methods for determining a weighted mean intensity
US11/042,654 Expired - Fee Related US8521441B2 (en) 1998-12-28 2005-01-24 Method and computer program product for reducing fluorophore-specific bias
US11/042,653 Abandoned US20050130215A1 (en) 1998-12-28 2005-01-24 Systems and methods for correcting error in biological response signal data
US11/303,121 Expired - Fee Related US7565251B2 (en) 1998-12-28 2005-12-12 Systems and methods for evaluating the significance of differences in biological measurements

Country Status (7)

Country Link
US (6) US6351712B1 (US20020128781A1-20020912-P00010.png)
EP (1) EP1141411A4 (US20020128781A1-20020912-P00010.png)
JP (1) JP2002533701A (US20020128781A1-20020912-P00010.png)
CN (1) CN1335893A (US20020128781A1-20020912-P00010.png)
AU (1) AU774830B2 (US20020128781A1-20020912-P00010.png)
CA (1) CA2356696C (US20020128781A1-20020912-P00010.png)
WO (1) WO2000039339A1 (US20020128781A1-20020912-P00010.png)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050130215A1 (en) * 1998-12-28 2005-06-16 Roland Stoughton Systems and methods for correcting error in biological response signal data
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Families Citing this family (111)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456942B1 (en) * 1998-01-25 2002-09-24 Combimatrix Corporation Network infrastructure for custom microarray synthesis and analysis
US6990221B2 (en) * 1998-02-07 2006-01-24 Biodiscovery, Inc. Automated DNA array image segmentation and analysis
SE9801420D0 (sv) * 1998-04-22 1998-04-22 Mikael Kubista Metod för karakterisering av enstaka testprover
WO2000046405A2 (en) * 1999-02-02 2000-08-10 Bernhard Palsson Methods for identifying drug targets based on genomic sequence data
US6142681A (en) 1999-02-22 2000-11-07 Vialogy Corporation Method and apparatus for interpreting hybridized bioelectronic DNA microarray patterns using self-scaling convergent reverberant dynamics
US6136541A (en) 1999-02-22 2000-10-24 Vialogy Corporation Method and apparatus for analyzing hybridized biochip patterns using resonance interactions employing quantum expressor functions
US6731781B1 (en) * 1999-09-30 2004-05-04 Biodiscovery, Inc. System and method for automatically processing microarrays
US7099502B2 (en) * 1999-10-12 2006-08-29 Biodiscovery, Inc. System and method for automatically processing microarrays
US6607885B1 (en) * 1999-10-15 2003-08-19 E. I. Du Pont De Nemours And Company Method for high-density microarray medicated gene expression profiling
US6424921B1 (en) * 2000-07-10 2002-07-23 Incyte Genomics, Inc. Averaging multiple hybridization arrays
JP4833495B2 (ja) * 2000-08-03 2011-12-07 アレイジェット リミテッド インクジェットプリントヘッドによるマイクロアレイの高度平行構成体
JP2002065259A (ja) * 2000-08-24 2002-03-05 Shinya Watanabe 核酸標識方法および核酸標識用キット
US6713257B2 (en) 2000-08-25 2004-03-30 Rosetta Inpharmatics Llc Gene discovery using microarrays
US20020107640A1 (en) * 2000-11-14 2002-08-08 Ideker Trey E. Methods for determining the true signal of an analyte
US20030130798A1 (en) * 2000-11-14 2003-07-10 The Institute For Systems Biology Multiparameter integration methods for the analysis of biological networks
CA2327527A1 (en) * 2000-12-27 2002-06-27 Geneka Biotechnologie Inc. Method for the normalization of the relative fluorescence intensities of two rna samples in hybridization arrays
US7127379B2 (en) * 2001-01-31 2006-10-24 The Regents Of The University Of California Method for the evolutionary design of biochemical reaction networks
CA2439260C (en) * 2001-03-01 2012-10-23 The Regents Of The University Of California Models and methods for determining systemic properties of regulated reaction networks
AU2002254162A1 (en) * 2001-03-08 2002-09-24 Chromavision Medical Systems, Inc. Apparatus and method for labeling rows and columns in an irregular array
US7219016B2 (en) * 2001-04-20 2007-05-15 Yale University Systems and methods for automated analysis of cells and tissues
AU2002307486A1 (en) * 2001-04-26 2002-11-11 Rosetta Inpharmatics, Inc. Methods and compositions for utilizing changes of hybridization signals during approach to equilibrium
EP1402456A4 (en) * 2001-05-22 2007-11-07 Entelos Inc METHODS FOR PREDICTING THE BIOLOGICAL ACTIVITIES OF CELLULAR CONSTITUENTS
US20030104426A1 (en) * 2001-06-18 2003-06-05 Linsley Peter S. Signature genes in chronic myelogenous leukemia
US6691042B2 (en) 2001-07-02 2004-02-10 Rosetta Inpharmatics Llc Methods for generating differential profiles by combining data obtained in separate measurements
US6768961B2 (en) * 2001-09-14 2004-07-27 Yield Dyamics, Inc. System and method for analyzing error information from a semiconductor fabrication process
US20030073085A1 (en) * 2001-10-05 2003-04-17 Fang Lai Amplifying expressed sequences from genomic DNA of higher-order eukaryotic organisms for DNA arrays
WO2003035865A1 (fr) * 2001-10-22 2003-05-01 Takara Bio Inc. Procede de marquage d'acides nucleiques
US7751981B2 (en) * 2001-10-26 2010-07-06 The Regents Of The University Of California Articles of manufacture and methods for modeling Saccharomyces cerevisiae metabolism
AU2002350131A1 (en) * 2001-11-09 2003-05-26 Gene Logic Inc. System and method for storage and analysis of gene expression data
US7418351B2 (en) * 2002-01-31 2008-08-26 Rosetta Inpharmatics Llc Methods for analysis of measurement errors in measured signals
AU2003216257A1 (en) * 2002-02-11 2003-09-04 Syngenta Participations Ag Gene function inferring using gene expression data
WO2003070938A1 (fr) * 2002-02-21 2003-08-28 Ajinomoto Co., Inc. Analyseur de donnees d'expression genique et procede, programme et support d'enregistrement pour l'analyse des donnees d'expression genique
AU2003213786A1 (en) * 2002-03-07 2003-09-22 University Of Utah Research Foundation Methods for identifying large subsets of differentially expressed genes based on multivariate microarray data analysis
US20030224363A1 (en) * 2002-03-19 2003-12-04 Park Sung M. Compositions and methods for modeling bacillus subtilis metabolism
US8229673B2 (en) * 2002-03-29 2012-07-24 Genomatica, Inc. Human metabolic models and methods
US8949032B2 (en) * 2002-03-29 2015-02-03 Genomatica, Inc. Multicellular metabolic models and methods
US7856317B2 (en) * 2002-06-14 2010-12-21 Genomatica, Inc. Systems and methods for constructing genomic-based phenotypic models
WO2004031885A2 (en) * 2002-08-01 2004-04-15 Gene Logic Inc. Method and system for managing and querying gene expression data according to quality
US7512496B2 (en) * 2002-09-25 2009-03-31 Soheil Shams Apparatus, method, and computer program product for determining confidence measures and combined confidence measures for assessing the quality of microarrays
US7869957B2 (en) * 2002-10-15 2011-01-11 The Regents Of The University Of California Methods and systems to identify operational reaction pathways
AU2003222214B2 (en) * 2002-10-15 2010-08-12 The Regents Of The University Of California Methods and systems to identify operational reaction pathways
US7996155B2 (en) 2003-01-22 2011-08-09 Microsoft Corporation ANOVA method for data analysis
JP2004254298A (ja) * 2003-01-30 2004-09-09 Ricoh Co Ltd 画像処理装置、プログラム及び記憶媒体
EP1625394A4 (en) * 2003-04-23 2008-02-06 Bioseek Inc METHOD FOR ANALYZING BIOLOGICAL DATA PROFILES
US8301388B2 (en) * 2003-05-05 2012-10-30 Amplicon Express, Inc. Pool and superpool matrix coding and decoding designs and methods
US20040229226A1 (en) * 2003-05-16 2004-11-18 Reddy M. Parameswara Reducing microarray variation with internal reference spots
EP1628993A4 (en) * 2003-05-16 2010-04-07 Rosetta Inpharmatics Llc METHOD AND COMPOSITIONS FOR RNA INTERFERENCE
JP2004348674A (ja) * 2003-05-26 2004-12-09 Noritsu Koki Co Ltd 領域検出方法及びその装置
US20050143628A1 (en) * 2003-06-18 2005-06-30 Xudong Dai Methods for characterizing tissue or organ condition or status
KR20060120063A (ko) 2003-09-29 2006-11-24 패스워크 인포메틱스 아이엔씨 생물학적 특징의 탐지 시스템 및 생물학적 특징 탐지 방법
US20050069863A1 (en) * 2003-09-29 2005-03-31 Jorge Moraleda Systems and methods for analyzing gene expression data for clinical diagnostics
US8321137B2 (en) * 2003-09-29 2012-11-27 Pathwork Diagnostics, Inc. Knowledge-based storage of diagnostic models
JP2005106755A (ja) * 2003-10-01 2005-04-21 Research Organization Of Information & Systems マイクロアレイ実験等から得られるデータの新規解析方法
US7519565B2 (en) * 2003-11-03 2009-04-14 Cloudmark, Inc. Methods and apparatuses for classifying electronic documents
US20050149546A1 (en) * 2003-11-03 2005-07-07 Prakash Vipul V. Methods and apparatuses for determining and designating classifications of electronic documents
US20050094807A1 (en) * 2003-11-04 2005-05-05 John Silzel Accuracy array assay system and method
US20050196782A1 (en) * 2003-12-23 2005-09-08 Kiefer Michael C. Universal amplification of fragmented RNA
US7454293B2 (en) * 2004-01-07 2008-11-18 University Of Hawai'i Methods for enhanced detection and analysis of differentially expressed genes using gene chip microarrays
AU2005225999A1 (en) * 2004-02-27 2005-10-06 Bioseek, Inc. Biological dataset profiling of asthma and atopy
CA2558808A1 (en) 2004-03-05 2005-09-22 Rosetta Inpharmatics Llc Classification of breast cancer patients using a combination of clinical criteria and informative genesets
US7881872B2 (en) * 2004-03-12 2011-02-01 Microsoft Corporation Methods of analyzing multi-channel profiles
US7660709B2 (en) * 2004-03-18 2010-02-09 Van Andel Research Institute Bioinformatics research and analysis system and methods associated therewith
US7653260B2 (en) * 2004-06-17 2010-01-26 Carl Zeis MicroImaging GmbH System and method of registering field of view
US8582924B2 (en) * 2004-06-30 2013-11-12 Carl Zeiss Microimaging Gmbh Data structure of an image storage and retrieval system
US7542854B2 (en) * 2004-07-22 2009-06-02 International Business Machines Corporation Method for discovering gene regulatory models and genetic networks using relational fuzzy models
US8484000B2 (en) 2004-09-02 2013-07-09 Vialogy Llc Detecting events of interest using quantum resonance interferometry
EP1910536B1 (en) * 2005-07-26 2009-09-09 Council Of Scientific And Industrial Research Methods for identifying genes that increase yeast stress tolerance, and use of these genes for yeast strain improvement
US7437249B2 (en) * 2006-06-30 2008-10-14 Agilent Technologies, Inc. Methods and systems for detrending signal intensity data from chemical arrays
CN101790731B (zh) * 2007-03-16 2013-11-06 纳特拉公司 用于清除遗传数据干扰并确定染色体拷贝数的系统和方法
US20090023182A1 (en) * 2007-07-18 2009-01-22 Schilling Christophe H Complementary metabolizing organisms and methods of making same
WO2009052417A2 (en) * 2007-10-18 2009-04-23 Rubinstein Wendy S Breast cancer profiles and methods of use thereof
US8105777B1 (en) 2008-02-13 2012-01-31 Nederlands Kanker Instituut Methods for diagnosis and/or prognosis of colon cancer
CN101250584B (zh) * 2008-03-19 2012-06-13 南京大学 一种识别显著差异表达基因集合的方法
US8086502B2 (en) 2008-03-31 2011-12-27 Ebay Inc. Method and system for mobile publication
US7991646B2 (en) 2008-10-30 2011-08-02 Ebay Inc. Systems and methods for marketplace listings using a camera enabled mobile device
EP3255146B1 (en) 2009-03-16 2019-05-15 Pangu Biopharma Limited Compositions and methods comprising histidyl-trna synthetase splice variants having non-canonical biological activities
US8825660B2 (en) * 2009-03-17 2014-09-02 Ebay Inc. Image-based indexing in a network-based marketplace
US20100310576A1 (en) 2009-03-31 2010-12-09 Adams Ryan A COMPOSITIONS AND METHODS COMPRISING ASPARTYL-tRNA SYNTHETASES HAVING NON-CANONICAL BIOLOGICAL ACTIVITIES
US9792638B2 (en) 2010-03-29 2017-10-17 Ebay Inc. Using silhouette images to reduce product selection error in an e-commerce environment
US9405773B2 (en) * 2010-03-29 2016-08-02 Ebay Inc. Searching for more products like a specified product
US8819052B2 (en) 2010-03-29 2014-08-26 Ebay Inc. Traffic driver for suggesting stores
US8861844B2 (en) 2010-03-29 2014-10-14 Ebay Inc. Pre-computing digests for image similarity searching of image-based listings in a network-based publication system
US8949252B2 (en) 2010-03-29 2015-02-03 Ebay Inc. Product category optimization for image similarity searching of image-based listings in a network-based publication system
JP6066900B2 (ja) 2010-04-26 2017-01-25 エータイアー ファーマ, インコーポレイテッド システイニルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
CA2797362C (en) 2010-04-27 2020-12-08 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of isoleucyl trna synthetases
JP6008837B2 (ja) 2010-04-28 2016-10-19 エータイアー ファーマ, インコーポレイテッド アラニルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
WO2011135459A2 (en) 2010-04-29 2011-11-03 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy
WO2011150279A2 (en) 2010-05-27 2011-12-01 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glutaminyl-trna synthetases
CA2797393C (en) 2010-04-29 2020-03-10 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of valyl trna synthetases
WO2011139854A2 (en) 2010-04-29 2011-11-10 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of asparaginyl trna synthetases
EP2566495B1 (en) 2010-05-03 2017-03-01 aTyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of phenylalanyl-alpha-trna synthetases
JP6008841B2 (ja) 2010-05-03 2016-10-19 エータイアー ファーマ, インコーポレイテッド メチオニルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
CA2797277C (en) 2010-05-03 2021-02-23 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of arginyl-trna synthetases
EP2566499B1 (en) 2010-05-04 2017-01-25 aTyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of p38 multi-trna synthetase complex
CN103200953B (zh) 2010-05-14 2017-02-15 Atyr 医药公司 与苯丙氨酰‑β‑tRNA合成酶的蛋白片段相关的治疗、诊断和抗体组合物的创新发现
JP6027965B2 (ja) 2010-05-17 2016-11-16 エータイアー ファーマ, インコーポレイテッド ロイシルtRNA合成酵素のタンパク質フラグメントに関連した治療用、診断用および抗体組成物の革新的発見
WO2011153277A2 (en) 2010-06-01 2011-12-08 Atyr Pharma, Inc. Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of lysyl-trna synthetases
US8407221B2 (en) 2010-07-09 2013-03-26 International Business Machines Corporation Generalized notion of similarities between uncertain time series
AU2011289831C1 (en) 2010-07-12 2017-06-15 Pangu Biopharma Limited Innovative discovery of therapeutic, diagnostic, and antibody compositions related to protein fragments of glycyl-tRNA synthetases
EP2608801B1 (en) 2010-08-25 2019-08-21 aTyr Pharma, Inc. INNOVATIVE DISCOVERY OF THERAPEUTIC, DIAGNOSTIC, AND ANTIBODY COMPOSITIONS RELATED TO PROTEIN FRAGMENTS OF TYROSYL-tRNA SYNTHETASES
US8412594B2 (en) 2010-08-28 2013-04-02 Ebay Inc. Multilevel silhouettes in an online shopping environment
EP2694963B1 (en) 2011-04-01 2017-08-02 Qiagen Gene expression signature for wnt/b-catenin signaling pathway and use thereof
JP6039656B2 (ja) 2011-06-01 2016-12-07 メディカル プログノシス インスティテュート エー/エス 癌再発の予後予測のための方法および装置
KR20140123571A (ko) 2012-02-16 2014-10-22 에이티와이알 파마, 인코포레이티드 자가면역 및 염증성 질환의 치료를 위한 히스티딜­trna 신테타제
US9934522B2 (en) 2012-03-22 2018-04-03 Ebay Inc. Systems and methods for batch- listing items stored offline on a mobile device
JP6309019B2 (ja) 2012-11-27 2018-04-11 ポンティフィシア・ウニベルシダッド・カトリカ・デ・チレPontificia Universidad Catolica de Chile 甲状腺腫瘍を診断するための組成物および方法
WO2014172390A2 (en) 2013-04-15 2014-10-23 Cedars-Sinai Medical Center Methods for detecting cancer metastasis
WO2014195032A1 (en) 2013-06-07 2014-12-11 Medical Prognosis Institute A/S Methods and devices for predicting treatment efficacy of fulvestrant in cancer patients
EP3074039A4 (en) 2013-11-26 2017-10-11 The Brigham and Women's Hospital, Inc. Compositions and methods for modulating an immune response
EP3169815B1 (en) 2014-07-15 2020-12-23 Ontario Institute For Cancer Research Methods and devices for predicting anthracycline treatment efficacy
US20180165424A1 (en) * 2016-12-14 2018-06-14 Exxonmobil Research And Engineering Company Method for dynamic bias management between online process analyzers and referee tests

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5510270A (en) * 1989-06-07 1996-04-23 Affymax Technologies N.V. Synthesis and screening of immobilized oligonucleotide arrays
US5552270A (en) * 1991-03-18 1996-09-03 Institut Molekulyarnoi Biologii Imeni V.A. Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same
US5569588A (en) * 1995-08-09 1996-10-29 The Regents Of The University Of California Methods for drug screening
US5777888A (en) * 1995-08-09 1998-07-07 Regents Of The University Of California Systems for generating and analyzing stimulus-response output signal matrices
US5800992A (en) * 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
US5965352A (en) * 1998-05-08 1999-10-12 Rosetta Inpharmatics, Inc. Methods for identifying pathways of drug action
US6245517B1 (en) * 1998-09-29 2001-06-12 The United States Of America As Represented By The Department Of Health And Human Services Ratio-based decisions and the quantitative analysis of cDNA micro-array images
US6351712B1 (en) * 1998-12-28 2002-02-26 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US130215A (en) * 1872-08-06 Improvement in steam and air brakes
US164273A (en) * 1875-06-08 Improvement in hand-stamps
US5155916A (en) * 1991-03-21 1992-10-20 Scientific Drilling International Error reduction in compensation of drill string interference for magnetic survey tools
US5807522A (en) * 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
JP2002515738A (ja) * 1996-01-23 2002-05-28 アフィメトリックス,インコーポレイティド 核酸分析法
US6165709A (en) * 1997-02-28 2000-12-26 Fred Hutchinson Cancer Research Center Methods for drug target screening
EP1078256B1 (en) * 1998-04-22 2002-11-27 Imaging Research, Inc. Process for evaluating chemical and biological assays
US6218122B1 (en) * 1998-06-19 2001-04-17 Rosetta Inpharmatics, Inc. Methods of monitoring disease states and therapies using gene expression profiles
US6171794B1 (en) * 1998-07-13 2001-01-09 Rosetta Inpharmatics, Inc. Methods for determining cross-hybridization
US6174794B1 (en) * 1998-08-20 2001-01-16 Advanced Micro Devices, Inc. Method of making high performance MOSFET with polished gate and source/drain feature
US6146830A (en) * 1998-09-23 2000-11-14 Rosetta Inpharmatics, Inc. Method for determining the presence of a number of primary targets of a drug
US6950752B1 (en) * 1998-10-27 2005-09-27 Rosetta Inpharmatics Llc Methods for removing artifact from biological profiles
US6203987B1 (en) * 1998-10-27 2001-03-20 Rosetta Inpharmatics, Inc. Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
US6453241B1 (en) * 1998-12-23 2002-09-17 Rosetta Inpharmatics, Inc. Method and system for analyzing biological response signal data
US6230987B1 (en) * 2000-05-23 2001-05-15 Hai Quang Truong Applicators for allowing a predetermined fluid flow for dissolving and distributing soluble substances

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5510270A (en) * 1989-06-07 1996-04-23 Affymax Technologies N.V. Synthesis and screening of immobilized oligonucleotide arrays
US5800992A (en) * 1989-06-07 1998-09-01 Fodor; Stephen P.A. Method of detecting nucleic acids
US5552270A (en) * 1991-03-18 1996-09-03 Institut Molekulyarnoi Biologii Imeni V.A. Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same
US5569588A (en) * 1995-08-09 1996-10-29 The Regents Of The University Of California Methods for drug screening
US5777888A (en) * 1995-08-09 1998-07-07 Regents Of The University Of California Systems for generating and analyzing stimulus-response output signal matrices
US5965352A (en) * 1998-05-08 1999-10-12 Rosetta Inpharmatics, Inc. Methods for identifying pathways of drug action
US6245517B1 (en) * 1998-09-29 2001-06-12 The United States Of America As Represented By The Department Of Health And Human Services Ratio-based decisions and the quantitative analysis of cDNA micro-array images
US6351712B1 (en) * 1998-12-28 2002-02-26 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050130215A1 (en) * 1998-12-28 2005-06-16 Roland Stoughton Systems and methods for correcting error in biological response signal data
US20050164273A1 (en) * 1998-12-28 2005-07-28 Roland Stoughton Statistical combining of cell expression profiles
US20060190191A1 (en) * 1998-12-28 2006-08-24 Rosetta Inpharmatics, Llc Systems and methods for evaluating the significance of differences in biological measurements
US7565251B2 (en) 1998-12-28 2009-07-21 Rosetta Inpharmatics Llc Systems and methods for evaluating the significance of differences in biological measurements
US8521441B2 (en) 1998-12-28 2013-08-27 Microsoft Corporation Method and computer program product for reducing fluorophore-specific bias
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US9607023B1 (en) 2012-07-20 2017-03-28 Ool Llc Insight and algorithmic clustering for automated synthesis
US10318503B1 (en) 2012-07-20 2019-06-11 Ool Llc Insight and algorithmic clustering for automated synthesis
US11216428B1 (en) 2012-07-20 2022-01-04 Ool Llc Insight and algorithmic clustering for automated synthesis
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Also Published As

Publication number Publication date
US20030093227A1 (en) 2003-05-15
AU774830B2 (en) 2004-07-08
AU2385500A (en) 2000-07-31
US20060190191A1 (en) 2006-08-24
JP2002533701A (ja) 2002-10-08
WO2000039339A1 (en) 2000-07-06
US7565251B2 (en) 2009-07-21
CA2356696A1 (en) 2000-07-06
CA2356696C (en) 2011-08-02
US7966130B2 (en) 2011-06-21
US8521441B2 (en) 2013-08-27
US6351712B1 (en) 2002-02-26
CN1335893A (zh) 2002-02-13
EP1141411A1 (en) 2001-10-10
EP1141411A4 (en) 2007-05-02
US20050164273A1 (en) 2005-07-28
US20050130215A1 (en) 2005-06-16

Similar Documents

Publication Publication Date Title
US6351712B1 (en) Statistical combining of cell expression profiles
Kurella et al. DNA microarray analysis of complex biologic processes
Deyholos et al. High‐density microarrays for gene expression analysis
US7897750B2 (en) Strategies for gene expression analysis
US7013221B1 (en) Iterative probe design and detailed expression profiling with flexible in-situ synthesis arrays
US20050282227A1 (en) Treatment discovery based on CGH analysis
US20020045169A1 (en) Gene discovery using microarrays
US20050214824A1 (en) Methods for monitoring the expression of alternatively spliced genes
Burgess Gene expression studies using microarrays
Eickhoff et al. Tissue gene expression analysis using arrayed normalized cDNA libraries
US20040091933A1 (en) Methods for genetic interpretation and prediction of phenotype
US20060281126A1 (en) Methods for monitoring the expression of alternatively spliced genes
US7371516B1 (en) Methods for determining the specificity and sensitivity of oligonucleo tides for hybridization
EP4116432A1 (en) Target-enriched multiplexed parallel analysis for assessment of fetal dna samples
EP1141415A1 (en) Methods for robust discrimination of profiles
KR20010081098A (ko) 유전자 발현 패턴의 탐지 및 분류를 강화하기 위한공통-조절 유전자세트를 이용하는 방법
Lockhart et al. DNA arrays and gene expression analysis in the brain
Teo Genotype calling for the Illumina platform
Ali et al. Developmental biology: an array of new possibilities
WO2002064743A2 (en) Confirming the exon content of rna transcripts by pcr using primers complementary to each respective exon
Compton et al. Gene Expression Profiling
Dago Performance assessment of different microarray designs using RNA-Seq as reference
WO2007087302A2 (en) Oligonucleotide matrix and methods of use
Choi DNA chips and microarray analysis: an overview
US20080090236A1 (en) Methods and systems for identifying tumor progression in comparative genomic hybridization data

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034766/0509

Effective date: 20141014