EP1038245A1 - Method and apparatus for providing an expression data mining database and laboratory information management - Google Patents

Method and apparatus for providing an expression data mining database and laboratory information management

Info

Publication number
EP1038245A1
EP1038245A1 EP99954613A EP99954613A EP1038245A1 EP 1038245 A1 EP1038245 A1 EP 1038245A1 EP 99954613 A EP99954613 A EP 99954613A EP 99954613 A EP99954613 A EP 99954613A EP 1038245 A1 EP1038245 A1 EP 1038245A1
Authority
EP
European Patent Office
Prior art keywords
information
experiments
sample
samples
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99954613A
Other languages
German (de)
French (fr)
Other versions
EP1038245A4 (en
Inventor
David J. Balaban
Elina Khurgin
Derek H. Bernhart
John Sowatsky
Arun Aggarwal
Luis Jevons
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Affymetrix Inc
Original Assignee
Affymetrix Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/354,935 external-priority patent/US6185561B1/en
Application filed by Affymetrix Inc filed Critical Affymetrix Inc
Priority claimed from US09/397,494 external-priority patent/US20030028501A1/en
Priority claimed from PCT/US1999/021305 external-priority patent/WO2000016220A1/en
Publication of EP1038245A1 publication Critical patent/EP1038245A1/en
Publication of EP1038245A4 publication Critical patent/EP1038245A4/en
Withdrawn legal-status Critical Current

Links

Definitions

  • the present invention relates to computer systems and more particularly to computer systems for mining and for managing laboratory operations about gene expression levels.
  • Devices and computer systems have been developed for collecting information about gene expression or expressed sequence tags (EST) in large numbers of samples.
  • EST expressed sequence tags
  • PCT application WO92/10588 inco ⁇ orated herein by reference for all pu ⁇ oses, describes techniques for sequence checking nucleic acids and other materials. Probes for performing these operations may be formed in arrays according to the pioneering techniques disclosed in U.S. Patent No. 5,143,854 and U.S. Patent No. 5,571,639, for example. Both of these U.S. Patents are inco ⁇ orated herein by reference for all pu ⁇ oses.
  • an array of nucleic acid probes is fabricated at known locations on a chip or substrate.
  • a fluorescent label attached to a nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids bound to the chip. Based upon the identities of the probes at these locations, information such as the monomer sequence of DNA or RNA can be extracted.
  • genes or expressed sequence tags may be collected on a large scale in many ways, including the probe array techniques described above.
  • One of the objectives in collecting this information is the identification of genes or ESTs whose expression is of particular importance.
  • researchers use such techniques to answer questions such as: 1) Which genes are expressed in cells of a malignant tumor but not expressed in either healthy tissue or tissue treated according to a particular regime? 2) Which genes or ESTs are expressed in particular organs but not in others? 3) Which genes or ESTs are expressed in particular species but not in others?
  • the present invention provides techniques for organizing expression or concentration information in a way that facilitates mining.
  • a database model is provided which may organize information relating to, e.g., sample preparation, expression analysis of experiment results, and intermediate and final results of mining gene expression measurements, gene sets and the like.
  • the model is readily translatable into database languages such as SQL and the like.
  • the database model can scale to permit mining of gene expression measurements collected from large numbers of samples.
  • a computer based method for mining a plurality of experiment information includes a variety of steps such as collecting information from experiments and chip designs.
  • the method can include steps of selecting experiments to be mined. Experiment results and other information can be organized by experimental analysis, and the like.
  • a step of defining one or more groupings for the experiments to be mined is also be part of the method.
  • the method also includes a step of selecting based upon the groupings, information about the experiments to be mined to form a plurality of resulting information. This resulting information can include one or more resulting gene sets, and the like.
  • the method formats the resulting information for viewing by a user. The combination of these steps can provide to the user the ability to access experiment information.
  • visualization techniques can be used in conjunction with the steps of the method to enable users to more easily understand the results of the data mining. Further, in some embodiments, a step of recording conclusions about the results of the data mining can also be part of the method.
  • a method for working with expression information includes a variety of steps such as collecting information about results of experiments.
  • a step of gathering information about samples and information about the experiments, which can comprise an experimental analysis and the like, is also part of the method.
  • the step of adding one or more attributes to the information about the experiments can also be performed.
  • the method then transforms the plurality of results of experiments into a plurality of transformed information. Transformations can include normalizing, de-normalizing, aggregation, scaling, and the like. Steps of mining the plurality of transformed information and visualizing the plurality of transformed information can also be part of the method.
  • the present invention provides techniques for improved monitoring of genetic expression or sequence analysis. More particularly, the present invention provides a method for managing laboratory operations for monitoring expression or performing sequence analysis.
  • a computer based method for managing information about a plurality of experiments conducted on a plurality of samples is provided. Each experiment can provide an indication of the degree that particular genes are expressed in a sample.
  • the method includes a variety of steps such as registering at least one of the plurality of samples with a centralized database.
  • the method can include steps of tracking a plurality of information about the samples and tracking a plurality of information about the experiments.
  • a step of producing a sample history about the plurality of samples from the plurality of information can also be a part of the method.
  • the method can include filtering the information about the experiments and the information about the samples according to parameters selected by a user.
  • the information can be made available for publishing to a variety of targets such as a public database.
  • the combination of these steps can provide a web based user interface that can enable the user to access the information.
  • the experimental result information can be entered in a format that can provide cross platform use and sharing of the information.
  • One such format is Genetic Analysis Technology Consortium ("GATC"), a standard for genomic databases provided by Molecular Dynamics, of Hayward, CA, and Affymetrix, Inc., of Santa Clara, CA. Reference may be had to http://www.gatconsortium.org for further information about GATC.
  • GATC Genetic Analysis Technology Consortium
  • many embodiments can use other standard formats, such as those commonly known in the art.
  • a method for viewing the results of a plurality of experiments which are stored in at least one database is provided. The method includes a variety of steps such as specifying a database to query. One or more queries can be submitted to form a result. The user can then view the result. The result may be filtered according to one or more user specified factors of interest in order to form a filtered result, which can be put into a graphical form, for example, for ease of viewing.
  • Some embodiments according to the present invention can provide better access to genetic experiment information than methods known in the prior art.
  • the present invention is more cost effective than conventional techniques.
  • Embodiments can provide answers to queries such as, "show all genes where the gene expression value is greater than or equal to 100, where at least three genes out of four respond to the query," as well as answers to many other and varied useful queries.
  • queries such as, "show all genes where the gene expression value is greater than or equal to 100, where at least three genes out of four respond to the query," as well as answers to many other and varied useful queries.
  • Another advantage provided by this approach is that the results of numerous experiments can be mined effectively using visualization techniques and set theory queries.
  • Some embodiments according to the invention are less complex than known techniques.
  • the present invention can also provide a graphical indication of laboratory analysis processes that is substantially clear for viewing.
  • Fig. 1 illustrates a representative system and process for forming and analyzing arrays of biological materials such as DNA or RNA in a particular embodiment according to the present invention.
  • Fig. 2A illustrates a computer system suitable for use in conjunction with the representative system of Fig. 1.
  • Fig. 2B illustrates a computer network suitable for use in conjunction with the representative system of Fig. 1.
  • Fig. 3 illustrates an entity relationship diagram for inte ⁇ reting a database model.
  • Figs. 4A-4F illustrate a database model for maintaining information for the system and method of Fig. 1 in a particular embodiment according to the present invention.
  • FIGs. 5A-5B depict simplified flowcharts of representative process steps in select embodiments according to the invention.
  • Figs. 6A-6F illustrate representative block flow diagrams in a particular embodiment according to the present invention.
  • Figs. 7A-7O illustrate representative user interface screens in a particular embodiment according to the present invention.
  • Fig. 1 ' illustrates an overall system and process for forming and analyzing arrays of biological materials such as DNA or RNA in a particular embodiment according to the present invention
  • FIGS. 2A'-2B' illustrate computer systems suitable for use in conjunction with the overall system of Fig. 1' in a particular embodiment according to the present invention
  • Figs. 3A'-3C illustrate simplified flowcharts of representative process steps according to particular embodiments according to the invention
  • Figs. 4A'-4B' illustrate representative database structures and data formats in a particular embodiment according to the present invention
  • Figs. 5A'-5C illustrate representative automation screens in a particular embodiment according to the present invention
  • Figs. 6A'-6H' illustrate representative expression analysis screens in a particular embodiment according to the present invention
  • Figs. 7A'-7C illustrate representative expression analysis screens for working with sets in a particular embodiment according to the present invention
  • Figs. 8A'-8G' illustrate representative expression data mining screens in a particular embodiment according to the present invention
  • Figs. 9A'-9F' illustrate representative annotation screens in a particular embodiment according to the present invention
  • Figs. 10A'-10F' illustrate representative function screens in a particular embodiment according to the present invention.
  • One embodiment of the present invention operates in the context of a system for analyzing biological or other materials using arrays that themselves include probes that may be made of biological materials such as RNA or DNA.
  • the VLSIPSTM and GeneChipTM technologies provide methods of making and using very large arrays of polymers, such as nucleic acids, on very small chips. Reference may be had to U.S.
  • Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid). It should be understood that the probes need not be nucleic acid probes but may also be other polymers such as peptides. Peptide probes may be used to detect the concentration of peptides, polypeptides, or polymers in a sample. The probes should be carefully selected to have bonding affinity to the compound whose concentration they are to be used to measure.
  • Fig. 1 illustrates a simplified diagram of a representative example system
  • a chip design system 104 is used to design arrays of polymers such as biological polymers such as RNA or DNA.
  • Chip design system 104 may be, for example, an appropriately programmed Sun Workstation or personal computer or workstation, such as an IBM PC equivalent, and the like.
  • Chip design system 104 obtains inputs from a user regarding chip design objectives including characteristics of genes of interest, and other inputs regarding the desired features of the array.
  • chip design system 104 may obtain information regarding a specific genetic sequence of interest from bioinformatics database 102 or from external databases such as GenBank.
  • the output of chip design system 104 is a set of chip design computer files in the form of, for example, a switch matrix, as described in PCT application WO 92/10092, and other associated computer files.
  • Systems for designing chips for sequencing, sequence checking and expression analysis are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 97/10365, the entire contents of which are herein inco ⁇ orated by reference for all purposes.
  • the chip design files are input to a mask design system (not shown) that designs the lithographic masks used in the fabrication of arrays of molecules such as DNA.
  • the mask design system designs the lithographic masks used in the fabrication of probe arrays.
  • the mask design system generates mask design files that are then used by a mask construction system (not shown) to construct masks or other synthesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays.
  • the masks are used in a synthesis system (not shown).
  • the synthesis system includes the necessary hardware and software used to fabricate arrays of polymers on a substrate or chip.
  • the synthesis system includes a light source and a chemical flow cell on which the substrate or chip is placed. A mask is placed between the light source and the substrate/chip, and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. Selected chemical reagents are directed through the flow cell for coupling to deprotected regions, as well as for washing and other operations.
  • the substrates fabricated by the synthesis system are optionally diced into smaller chips.
  • the output of the synthesis system is a chip ready for application of a target sample. Information about the mask design, mask construction, and probe array synthesis systems is presented by way of background.
  • a biological source 112 is, for example, tissue from a plant or animal.
  • Various processing steps are applied to material from biological source 112 by a sample preparation system 114. These steps may include isolation of mRNA, precipitation of the mRNA to increase concentration. The result of the various processing steps is a target sample ready for application to the chips produced by the synthesis system 110.
  • Sample preparation methods for expression analysis are discussed in detail in WO97/10365.
  • the prepared samples include monomer nucleotide sequences such as RNA or DNA.
  • the nucleotides may or may not bond to the probes.
  • the nucleotides have been tagged with fluorescein labels to determine which probes have bonded to nucleotide sequences from the sample.
  • the prepared samples will be placed in a scanning system 118.
  • Scanning system 118 includes a detection device such as a confocal microscope or CCD (charge-coupled device) that is used to detect the location where labeled receptors have bound to the substrate.
  • the output of scanning system 118 is an image file(s) indicating, in the case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other related measurements, such as voltage) as a function of position on the substrate.
  • An expression analysis database 122 maintains information used to analyze expression and the results of expression analysis.
  • Contents of expression analysis database 122 may include tables listing analyses performed, analysis results, experiments performed, sample preparation protocols and parameters of these protocols, chip designs, etc. Details of one embodiment of expression analysis database 122 are described in U.S. Patent App. No. 09/122,167, entitled METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE, filed on July 24, 1998, the entire contents of which are inco ⁇ orated herein by reference for all pu ⁇ oses.
  • One or more instantiations of expression analysis database 122 may contain information concerning the expression of many genes or ESTs as collected from many different tissue samples. It would be useful to use this information to investigate questions such as, e.g., 1) which genes or ESTs are upregulated (expressed more) in diseased tissue and downregulated (expressed less) in disease tissue, 2) how does gene expression vary among organs and tissue types within a species, 3) how does gene expression vary among species which share common genes, 4) how does gene expression respond to various disease treatment regimes, 5) how does gene expression vary with progression of disease, etc.
  • an expression mining database 124 is provided. Expression mining database 124 may include duplicate representations of data in expression analysis database.
  • Expression mining database 124 may also include various tables to facilitate mining operations conducted by a user who operates a querying and mining system 126.
  • Querying and mining system 126 includes a user interface that permits an operator to make queries to investigate expression of genes and ESTs and answer the types of questions identified above.
  • An example of a querying and mining system is described in U.S. Patent Application No. 09/122,434, entitled GENE EXPRESSION AND EVALUATION SYSTEM, filed July 24, 1998, the entire contents of which are inco ⁇ orated herein by reference for all pu ⁇ oses.
  • Chip design system 104, analysis system 120 and control portions of exposure system 116, sample preparation system 114, and scanning system 118 may be appropriately programmed computers such as a Sun workstation or IBM-compatible PC.
  • An independent computer for each system may perform the computer-implemented functions of these systems or one computer may combine the computerized functions of two or more systems.
  • One or more computers may maintain expression analysis database 122, expression mining database 124, and querying and mining system 126 independent of the computers operating the systems of Fig. 1.
  • Fig. 2A depicts a simplified block diagram of a representative host computer system 10 in a particular embodiment according to the present invention.
  • Host computer system 210 includes a bus 212 which interconnects major subsystems such as a central processor 214, a system memory 216 (typically RAM), an input/output (I/O) adapter 218, an external device such as a display screen 224 via a display adapter 226, a keyboard 232 and a mouse 234 via an I/O adapter 218, a SCSI host adapter 236, and a removable disk drive 238 operative to receive a removable disk 240.
  • I/O input/output
  • SCSI host adapter 236 may act as a storage interface to a fixed disk drive 242 or a CD- ROM player 244 operative to receive a CD-ROM 246.
  • Fixed disk 244 may be a part of host computer system 210 or may be separate and accessed through other interface systems.
  • a network interface 248 may provide a direct connection to a remote server via a telephone link or to the Internet.
  • Network interface 248 may also connect to a local area network (LAN) or other network interconnecting many computer systems. Many other devices or subsystems (not shown) may be connected in a similar manner.
  • LAN local area network
  • Fig. 2A depicts a simplified diagram of a network 260 interconnecting multiple computer systems 210a-210e. This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Network 260 may be a local area network (LAN), wide area network (WAN), etc.
  • Bioinformatics database 102 and the computer-related operations of the other elements of Fig. 2B may be divided amongst computer systems 210 in any way with network 260 being used to communicate information among the various computers.
  • Portable storage media such as removable disks may be used to carry information between computers instead of network 260.
  • Expression mining database 124 is preferably a multidimensional relational database with a complex internal structure. However, other types of databases can also be used in select embodiments without departing from the scope of the present invention.
  • ERD Entity Relationship Diagram
  • a representative table 302 includes one or more key attributes 304 and one or more non-key attributes 306.
  • Representative table 302 includes one or more records where each record includes fields corresponding to the listed attributes. The contents of the key fields taken together identify an individual record.
  • each table is represented by a rectangle divided by a horizontal line. The fields or attributes above the line are key while the fields or attributes below the line are non- key attributes.
  • An identifying relationship 308 signifies that the key attribute of a parent table 310 is also a part of a composite key attribute of a child table 312.
  • a non- identifying relationship 314 signifies that the key attribute of a parent table 316 is also a non-key attribute of a child table 318.
  • Foreign keys denoted by (FK) comprise attributes of one table that are either a key or a part of a composite of another table. For both the non-identifying and the identifying relationship, one record in the parent table corresponds to one or more records in the child table.
  • Fig. 4A illustrates a simplified entity relationship diagram (ERD) of elements of expression mining database 124 in a particular embodiment according to the present invention.
  • Fig. 4A is merely an illustration and should not limit the scope of the claims herein.
  • Rectangles in Fig. 4A correspond to tables in expression mining database 124. For each rectangle, the title of the table is listed above the rectangle. Within each rectangle, columns of the table are listed. Above a horizontal line within each rectangle are listed key columns, columns whose contents are used to identify individual records in the table. Below this horizontal line are the names of non-key attributes. The lines between the rectangles identify the relationships between records of one table and records of another table. First, the relationships among the various tables will be described. Then, the contents of each table will be discussed in detail.
  • expression mining database 124 is updated during mining operations. Certain tables are updated by importation and transformation from expression analysis database 122. Certain other tables may be updated as an operator of querying and mining system 126 defines a query operation. It can be useful to identify genes or ESTs whose expression varies in some way depending on one or more tissue attributes. Therefore, it is necessary for querying and mining system 126 to have awareness of tissue attributes associated with expression analysis results. One or more analysis results are typically associated with what is herein referred to as "leaf target samples.”
  • a "raw sample” represents a piece of extracted tissue. Before further processing, a single raw sample may be cleaved into multiple raw samples. The raw samples are the input to sample preparation system 114. For each raw sample, sample preparation system 114 prepares a so-called “target” which is a fluid including mRNA or other expression indicator. A “target” may be split into multiple “replicates” and replicates may be pooled to form another target. The individual “targets” that are applied to chips are the leaf target samples. Each application of a "leaf target sample” to a chip represents an experiment. In a presently preferable embodiment, expression analyses can be conducted on experiment data according to one or more selectable criteria to produce experimental analysis result data.
  • the tables of expression mining database 124 that relate to samples and attributes are identified in Fig. 4A by the letter "A.”
  • Leaf target samples, raw samples, replicates, targets, etc. are listed in a sample item table 402.
  • a sample item derivation table 404 lists transformations from one sample item to another.
  • a sample derivation type table 406 lists the various types of transformation.
  • the various sample item types themselves, e.g., target, replicate, raw sample, leaf target sample, analyses and the like, are listed in a sample item type table 408. Listing the sample derivation types and sample item types allows easy reprogramming to accommodate changes in sample processing procedures.
  • attributes Associated with samples are attributes. Some of the attributes are strings or values identifying concentrations, sample preparation dates, expiration dates, and the like. Other attributes identify characteristics that are highly useful in searching for genes or ESTs of interest such as the disease state of tissue, the organ, or species from which a sample is extracted. Attributes are listed in a sample item attribute table 410.
  • a sample item attribute map table 412 implements a many-to-many relationship between sample item attribute table 410 and a sample item table 402. A sample may have more than one attribute, and an attribute can describe more than one sample item.
  • Each attribute has an associated attribute type listed in a sample item attribute type table 414 and an associated value for the attribute.
  • attribute types are "concentration,” “preparation date,” “expiration date,” etc.
  • Another example of an attribute type would be “specimen type” where possible values would correspond to “tissue,” organ culture,” “purified cells,” “primary cell culture,” “established cell line ' and the like.
  • Another example might be “ethnic group” where different values may correspond to "East Asian,” Native American,” for example.
  • Many attribute types may be understood to derived from other attribute types.
  • the attribute type "ethnic group” may derived from an attribute type "human” which is in turn derived from an attribute "species.”
  • Some attribute types have no associated attributes but rather define levels of categorization.
  • attribute type derivation type table 418 The derivations relating a "parent" attribute type to a "child” attribute type are listed in an attribute type derivation type table 418. Any attribute type may have one or more parents or children. Different types of derivation are listed in an attribute type derivation type table 420.
  • One representative attribute type derivation type is category-subcategory where the parent type represents a category and the child type represents the subcategory.
  • An experiment table 424 lists experiments whose results are available for querying.
  • a data map table 426 lists entries corresponding to sets of genes or ESTs to be investigated. Each set corresponds to a collection of experiments performed to investigate the genes in the set.
  • An experiment set table 428 lists associations between experiments and entries in data map table 426 and thus defines the collection of experiments corresponding to each gene set.
  • An analysis set table 430 defines sets of analyses that have been performed corresponding to each gene set. Each entry defines an association between an analysis, an experiment and an entry in data map table 426.
  • a gene set table 432 defines membership in all sets of genes that have defined by users to prepare for querying and mining operations or have been otherwise defined.
  • a gene set name table 434 lists names for the gene sets. Genes belonging to gene sets are listed in a bio-item accession table 436. Each entry in bio-item accession table 436 identifies an accession number in a bio-item database. Definitions for accession numbers are stored in an accession definition table 438.
  • a housekeeping genes table 440 lists genes with known expression level that are used to calibrate the expression monitoring process.
  • Tables related to analysis information are denoted with the letter "D.”
  • Absolute expression analysis results are stored in an absolute result table 444. Each entry in absolute result table 444 references an absolute result type. Different absolute result types may include e.g., present, marginal, absent, and unknown, indicating an estimate of the expression level of a given gene or EST.
  • the various relative absolute result types are listed in an absolute result type table 446.
  • Relative analysis results are stored in a relative analysis result table 448. Each entry in relative analysis result table 448 references a relative result type listed in a relative result type table 450. Relative analyses compare expression of a gene in two experiments. Different relative result types may include e.g., increased, no change, decreased, and unknown, all describing the change of expression.
  • Tables 448 and 450 are imported from expression analysis database 122 and are read-only from the viewpoint of querying and mining system 126.
  • Querying and mining system 126 also performs various expression analysis operations. Results of these calculations are maintained in a calculated fields table 452.
  • Tables related to mining and querying operations are denoted with a letter "E.”
  • a user considers data from a collection of experiments. A list of the sample items which were used for these experiments is stored in a selected sample item table 454. Selected sample item table 454 is typically much smaller than sample item table 402, which can make query operations faster.
  • Each entry in a criteria set table 456 identifies a set of criteria used to query a group selected by sample item or by attribute.
  • Each entry in a criteria set experiment table 458 identifies a set of criteria applied to gene or EST expression levels of a particular sample item belonging to a group identified by reference to criteria set table 456.
  • a criteria set experiment detail table 460 includes entries identifying values to be applied as criteria.
  • a user of querying and mining system 126 does not have access to information about leaf target samples but rather only about their "parents 1 The expression data can be recorded concerning the leaf target samples.
  • Entries in criteria set experiment table 458 can be associated with sample items in sample item table 458 and leaf target samples corresponding to these sample items by means of a criteria set experiment leaf table 462.
  • a user preferences table 464 stores references to user preference files that record the preferences of individual users of querying and mining system 126. Users may wish to store functions used for normalization of expression data for later use.
  • a normalization adjustment function table 466 lists information about normalization and other transformation functions. Users may wish to store functions used to average expression data collected from related replicates. Descriptions of these averaging functions are stored in a replicate average function table 468.
  • Fig. 5 A illustrates a flowchart 501 of simplified process steps in a particular representative embodiment according to the invention for mining a plurality of experiment information for a pattern.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • a step 502 information from experiments and chip designs is collected. Then, in a step 504, experimental analyses to mine are selected. In a step 506, one or more sample attributions are defined. In a step 508, resulting information is determined from the experimental analyses by mining to form a plurality of resulting information. This resulting information can include one or more resulting gene sets.
  • a step 510 formats the resulting information for viewing by a user. The combination of these steps can provide to the user the ability access experiment information.
  • Fig. 5B illustrates a flowchart 503 of simplified process steps in an alternative embodiment according to the invention for working with expression information.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • information about a plurality of results of a plurality of experimental analyses is collected.
  • information about samples and information about the plurality of experiments is gathered.
  • one or more attributes are added to the information about the experiments.
  • the plurality of results of experiments information is transformed to form a plurality of transformed information. Transformation can comprise normalization, denormalization, scaling, aggregation, and the like.
  • the plurality of transformed information is mined.
  • the results of the mining are visualized for display to the user.
  • conclusions are recorded.
  • Block flow diagram 601 includes an input data warehouse 602, a transformation step 604 to produce an output data mart 606 and a mining process step 608.
  • Input data warehouse 602 can comprise a laboratory information management system and other databases.
  • Data warehouse 602 in a particular embodiment can include genomic information and chip design information, as well as other useful information in the laboratory expression analysis process.
  • Fig. 6B illustrates a simplified block diagram of a representative data warehouse such as data warehouse 602 of Fig. 6A in a particular embodiment according to the present invention.
  • Data warehouse 602 comprises a laboratory information management system 610 and a plurality of published databases including published database 612.
  • a chip design component 614 can also be included in data warehouse 602.
  • genomic information component 616 can also be a part of data warehouse 602.
  • other reference databases 618 can also be part of data warehouse 602.
  • Many embodiments can also include other information or may omit any of these particular components without departing from the scope of the present invention.
  • Data transformation step 604 of Fig. 6A can comprise in a particular embodiment according to the present invention a normalization and adjustment step. Normalization and adjustment can include functions tracked by analysis type and/or functional type. In some embodiments, a VBA function or independent applet can be added or removed. Additionally, in many embodiments, a user may selectively omit some transformations according to a preference. Data transformation step 604 can include a replicate step in which a user can manipulate replicates in ways similar to normalizations and adjustments. Further, in many embodiments a user can identify derivation-type replicates using a sample identification. Yet further, in some embodiments, custom selection of replicates can be embedded in an applet.
  • Fig. 6C illustrates a representative data mart such as data mart 606 of Fig. 6A in a particular embodiment according to the present invention.
  • Representative data mart 606 can comprise an experiment collection 620. Information and results of the experiment collection can be forwarded to an expression result 622.
  • a plurality of samples 624 which can have one or more sample attributes, can further have a relationship to expression result 622.
  • a plurality of genes 626 can also be included in data mart 606.
  • time may be treated as a dimension 628 of expression result 622.
  • experiments can be added to or removed from experiment collection 620. Further, in many embodiments, the same experiment collection can be mined for a plurality of purposes. Yet further, experiment collection 620 can be subdivided into one or more subsets of experiments to be mined.
  • Fig. 6D illustrates a representative organization of samples and targets such as samples 624 of Fig. 6C in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Samples and targets can allow a user to describe stages of an experiment.
  • a raw sample At a top level is a raw sample.
  • Fig 6D illustrates sample 624 that comprises a raw sample 630. Below the raw sample are one or more replicates. Two replicates, a replicate 632 and a replicate 634 comprise raw sample 630.
  • Replicates can comprise targets.
  • Replicate 632 is a target treated with a drug A.
  • Replicate 634 is a target treated with drug B.
  • One or more leaf targets can comprise a target.
  • leaf targets 636, 638, 640 and 642 comprise target 632.
  • Leaf targets 644, 646, 648 and 650 comprise target 634.
  • Experimental analyses can be associated with the leaf targets.
  • Fig. 6D illustrates an experimental analysis 652 and an experimental analysis 654 associated with leaf target 632.
  • experimental analyses can be recursively defined, i.e., an experimental analysis can comprise one or more experimental analyses.
  • intermediate levels can be defined by the user. Other levels can be included and other organizations may be used without departing from the scope of the claims of the present invention.
  • Fig. 6E illustrates another representative organization of samples and targets such as samples 624 of Fig. 6C in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Fig. 6E illustrates a raw sample 670 that represents a piece of extracted tissue, for example.
  • Raw sample 670 has been cleaved into multiple raw samples, such as raw samples 672, 673 and 674.
  • the raw samples are the input to sample preparation system 1 14 of Fig. 1.
  • Sample preparation system 114 prepares targets, such as target 676 corresponding to raw sample 672.
  • the target can be a fluid including mRNA or other expression indicator.
  • Target 672 has been split into multiple replicates, such as replicates 677, 678 and 679.
  • Replicates 678 and 680 have been pooled to form another target, target 682.
  • the individual "targets" that are applied to chips are the leaf target samples.
  • Each application of a "leaf target sample” to a chip represents an experiment.
  • Leaf target sample 684 is an example.
  • one or more experimental analyses can be associated with a particular leaf target sample.
  • analyses 686 and 688 are associated with leaf target sample 684.
  • an experimental analyses can be defined in terms of one or more other experimental analyses.
  • Fig. 6F illustrates a representative organization of a plurality of attributes such as attribute 628 of Fig. 6C in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Fig. 6F illustrates a plurality of attributes having a non- hierarchical structure. In a presently preferable embodiment, an unlimited number of attributes can be assigned to any particular sample. Yet further, different samples can have the same attributes.
  • Fig. 6F illustrates an organism species 660 having a relationship with a plurality of attributes such as human attribute 662, mouse attribute 664, corn attribute 666 and yeast attribute 668.
  • the "strain" and “race” windows are examples of attributes. Other arrangements and attributes can be used in various embodiment without departing from the scope of the claims of the present invention.
  • genes 626 of Fig. 6C can be combined into one or more gene sets.
  • Gene sets can be described by various users and in at least one particular embodiment are not shared among users, but can be shared by users in other embodiments.
  • a user can copy other users' gene sets and can edit or delete gene sets.
  • gene sets can be created or saved during mining of the data mart.
  • one or more functional operations such as logical operations like union and intersection, arithmatic operations, such as additions, subtractions, scaling, and the like, can be applied to gene sets.
  • Fig. 7A illustrates a representative experiment collection screen 701 of a user interface in a particular embodiment according to the present invention.
  • Screen 701 enables a user to interact with an experiment collection comprised in expression mining database 124 of Fig. 1.
  • Screen 701 comprises an experiment collection selection tab 702 shown with four experiment collections, such as experiment collections 704 and 706. Other experiment collections can be added as needed.
  • Other formats can also be used for presenting this information to a user in various embodiments according to the present invention.
  • Fig. 7B illustrates an experiment selection screen 703 in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Experiment selection screen 703 comprises an experiment tab 730.
  • a plurality of experiments is indicated in two scrolling windows, an experiments selected window 734 and an experiments available window 736.
  • Selection buttons 738a and 738b enable various experiments to be moved between experiment scrolling selection window 734 and 736.
  • Experiment selection window 736 includes a plurality of experiments.
  • One or more filters may be applied to the experiment data to limit the number of experiments depicted in experiment scrolling selection windows 734 and 736 using the filter mechanism at the bottom of the screen.
  • the filter mechanism 744 comprises a column selection field 746 and a selection value input field 748.
  • a user may select a particular field for which to screen experiments using column selection field 746 and then enter a desired value in value input field 748. Then, by clicking filter button 750, the user can apply the filter to the experiments in the collection so that only experiments in which the column is set to the selected value will be detected in experiment selection scroll windows 734 and 736.
  • Fig. 7C illustrates a selected experiment collection screen 705 having an analysis tab 751 in a particular embodiment according to the present invention.
  • Screen 705 comprises two scrolling selection windows, an analyses selected window 752 and an analyses available window 754.
  • Selection keys 756 and 758 may be used to move various analyses between scrolling selection windows 752 and 754.
  • a filter mechanism provided at the bottom of screen 705 enables a user to screen the analyses depicted in scrolling selection windows 752 and 754 by selecting a particular column using column selection field 760 and inputting a desired value into value input field 762 and then clicking filter button 764 to apply the filter to the analyses in the experiment collection.
  • Fig. 7D illustrates a representative sample selection screen 707 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives.
  • Screen 707 enables a user to view the results of selections made on one or more samples.
  • Screen 707 comprises a plurality of selections including a sample selection 770, a sample-type selection 771 and an attribute-type selector 772.
  • a previous/next button pair 774 and a select button 775 enable searching and selecting, respectively.
  • Fig. 7E illustrates a representative sample and attribute management screen 709 in a particular embodiment according to the present invention.
  • Screen 709 comprises a samples and attributes section 722 and a relationships section 724.
  • Item selection window 776 of sample and attribute section 722 provides functions that enable the user to select the type of new item, sample, attribute, and the like.
  • Function buttons 777 enable the user to select operations such as add new, rename, delete and the like. If the user elects to create a new item, then screen 711 of Fig.
  • Screen 711 enables the user to create new items.
  • the user can enter a name for the item in a new item field 780 and an item type in item type field 784 of screen 711. Otherwise, the user can work with relationships using the relationship section 724 in screen 709 of Fig. 7E.
  • Relationship selection window 778 of relationship section 724 enables the user to select the type of relationship, such as a relationship between sample item to sample item, a relationship between attribute and sample item or a relationship between attribute type to attribute type, for example.
  • Function buttons 779 enable the user to select operations such as add new, delete and the like. If the user elects to create a new relationship, then screen 713 of Fig. 7F is displayed. Screen 713 enables the user to create new relationships. The user can enter a source of the relationship in a source window 782, a parent in the parent window 786 and a type of relationship in the derivation type window 788.
  • Fig. 7G illustrates a representative data mining option management screen 715 in a particular embodiment according to the present invention.
  • Screen 715 illustrates a plurality of tabs, including a queries and charts tab 790, a patterns tab 792 and a gene set comparison tab 794. The user can specify some grouping parameters using the group by functions of queries and charts tab 790 in order to begin data mining.
  • Fig. 7H illustrates an experiment mining screen 717 in a particular embodiment according to the present invention.
  • Screen 717 includes a plurality of sample items, such as sample item 796.
  • a group selection field 798 enables a user to select from a plurality of groups in the experiment collection.
  • One or more gene sets can be selected using the gene set selection field 800.
  • Gene sets can be all genes represented by a particular gene chip, or a subset.
  • a default of all gene sets on a particular gene chip is provided in one particular embodiment, but other defaults can be used.
  • a presence measure of the gene expression within the group can be specified using the expression percentage field 802. When the user has specified the search parameters using these fields, depressing the execute button 801 starts the data mining.
  • Fig. 71 illustrates a selected data screen 719 in a particular embodiment according to the present invention.
  • Data selection screen 719 illustrates the data that meet the criteria specified by the user in experiment mining screen 717 of Fig. 7H.
  • Data selection screen 719 illustrates a plurality of leaf parents, including leaf parent 804.
  • Screen 719 also illustrates experiment replications 805, bio items 806 and results measured 807 during the experiment for each leaf parent. Users can export the results of the mining using export button 808 and/or can save the results of the mining using save gene set button 809.
  • Fig. 7J illustrates a bar chart visualization screen 721 in a particular embodiment according to the present invention.
  • Scatter pot selection visualization screen 721 comprises a display area having a display of the data in the experiment collection.
  • a quantity to be visualized can be selected from select value field 814.
  • Experimental results 810 and 812 indicate differences in expression for a particular gene for the quantity selected by the user with field 814.
  • Fig. 7K illustrates a scatter plot visualization screen 723 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein.
  • One of ordinary skill in the art would recognize other variations, modifications and alternatives.
  • Scatter pot selection visualization screen 723 comprises a display area 819 having a display of the data in the experiment collection. While display area 819 illustrates an X-Y plot, other forms of data visualization, such as bar charts, graphs, pie-charts and the like, are contemplated by various embodiments according to the present invention.
  • Fig. 7L illustrates pattern search screens 725 and 727 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Gene pattern searching enables the user to determine relationships such as which genes behave similarly when exposed to a certain drug, and the like.
  • Selecting the "pattern" tab on screen 725 displays information entry devices for entering search criteria, including a gene patterns field 820. By specifying search on gene patterns, the user can be presented with gene pattern search screen 727. The user can select a plurality of gene sets to compare using gene set name fields 822 and 824. A measurement selection field 826 enables the user to select a measurement of interest as a basis of the comparison.
  • Fig. 7M illustrates gene set comparison screens 729 and 731 in a particular embodiment according to the present invention.
  • Gene set comparisons enable the user to determine relationships such as which gene sets include particular genes, exclude particular genes, or functional combinations of genes. Selecting the "gene set comparison" tab on screen 729 displays information about gene sets that can be selected by the user for comparison.
  • Screen 729 illustrates a plurality of gene sets, including gene sets 830, 832 and 834. After specifying gene sets of interest, the user can be presented with gene comparison screen 731. The user can select a plurality of genes as bases of comparing the gene sets selected in screen 720 by checking one or more of a plurality of selection windows, such as selection windows 836 and 838.
  • Figs. 7N-7O illustrate sample data management screens 733 and 735 in particular embodiments according to the present invention. These diagrams are merely illustrations and should not limit the scope of the present invention. One of ordinary skill in the art would recognize other variations, modifications and alternatives.
  • Fig. 7N illustrates gene set management screen 733. This screen enables the user to perform a variety of tasks with genes and gene sets, such as add, remove, create and copy gene sets, and add and remove genes within gene sets, and the like.
  • Fig. 7O illustrates update gene set screen 735. This screen enables the user to specify one or more genes to be removed from the database.
  • One embodiment of the present invention operates in the context of a system for analyzing biological or other materials using arrays that themselves include probes that may be made of biological materials such as RNA or DNA.
  • the VLSIPSTM and GeneChipTM technologies provide methods of making and using very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. Patent No. 5,143,854 and PCT Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby inco ⁇ orated by reference for all pu ⁇ oses.
  • Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid).
  • probes need not be nucleic acid probes but may also be other polymers such as peptides.
  • Peptide probes may be used to detect the concentration of peptides, polypeptides, or polymers in a sample. The probes should be carefully selected to have bonding affinity to the compound whose concentration they are to be used to measure.
  • Fig. 1 ' illustrates an overall system 100' for forming and analyzing arrays of biological materials such as RNA or DNA.
  • a chip design system 104' is used to design arrays of polymers such as biological polymers such as RNA or DNA.
  • Chip design system 104' may be, for example, an appropriately programmed Sun Workstation or personal computer or workstation, such as an IBM PC equivalent, including appropriate memory and a CPU.
  • Chip design system 104' obtains inputs from a user regarding chip design objectives including characteristics of genes of interest, and other inputs regarding the desired features of the array.
  • chip design system 104' may obtain information regarding a specific genetic sequence of interest from bioinformatics database 102' or from external databases such as GenBank.
  • the output of chip design system 104' is a set of chip design computer files in the form of, for example, a switch matrix, as described in PCT application WO 92/10092, and other associated computer files.
  • Systems for designing chips for sequence determination and expression analysis are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 97/10365, the contents of which are herein inco ⁇ orated by reference.
  • the chip design files are input to a mask design system (not shown) that designs the lithographic masks used in the fabrication of arrays of molecules such as DNA.
  • the mask design system designs the lithographic masks used in the fabrication of probe arrays.
  • the mask design system generates mask design files that are then used by a mask construction system (not shown) to construct masks or other synthesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays.
  • the masks are used in a synthesis system (not shown).
  • the synthesis system includes the necessary hardware and software used to fabricate arrays of polymers on a substrate or chip.
  • the synthesis system includes a light source and a chemical flow cell on which the substrate or chip is placed. A mask is placed between the light source and the substrate/chip, and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. Selected chemical reagents are directed through the flow cell for coupling to deprotected regions, as well as for washing and other operations.
  • the substrates fabricated by the synthesis system are optionally diced into smaller chips.
  • the output of the synthesis system is a chip ready for application of a target sample. Information about the mask design, mask construction, and probe array synthesis systems is presented by way of background.
  • a biological source 112' is, for example, tissue from a plant or animal.
  • Various processing steps are applied to material from biological source 112' by a sample preparation system 114'. These steps may include isolation of mRNA, precipitation of the mRNA to increase concentration. The result of the various processing steps is a target sample ready for application to the chips produced by the synthesis system 110'.
  • Sample preparation methods for expression analysis are discussed in detail in WO97/10365.
  • the prepared samples include nucleic acid sequences such as RNA or DNA. When the sample is applied to the chip by a sample exposure system 116', the nucleic acids in the sample may or may not bond to the probes.
  • the nucleic acids have been tagged with fluorescein labels to determine which probes have bonded to nucleic acid sequences from the sample.
  • the prepared samples will be placed in a scanning system 118'.
  • Scanning system 118' includes a detection device such as a confocal microscope or CCD (charge-coupled device) that is used to detect the location where labeled receptors have bound to the substrate.
  • the output of scanning system 118' is an image file(s) indicating, in the case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other related measurements, such as voltage) as a function of position on the substrate.
  • the image files and the design of the chips are input to an analysis system 120' that, e.g., calls base sequences, or determines expression levels of genes or expressed sequence tags.
  • the expression level of a gene or EST is herein understood to be the concentration within a sample of mRNA or protein that would result from the transcription of the gene or EST.
  • Such analysis techniques are disclosed in WO97/10365 and U.S. App. No. 08/531,137, the contents of which are herein inco ⁇ orated by reference.
  • An expression analysis database 122' maintains information used to analyze expression and the results of expression analysis.
  • Contents of expression analysis database 122' may include tables listing analyses performed, analysis results, experiments performed, sample preparation protocols and parameters of these protocols, chip designs, etc. Details of one embodiment of expression analysis database 122' are described in U.S. Patent Application No. 09/122,167, entitled METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE, filed on July 24, 1998, the contents of which are inco ⁇ orated herein by reference for all pu ⁇ oses.
  • One or more instantiations of expression analysis database 122' may contain information concerning the expression of many genes or ESTs as collected from many different tissue samples. It would be useful to use this information to investigate questions such as, e.g., 1) which genes or ESTs are upregulated (expressed more) in diseased tissue and downregulated (expressed less) in disease tissue, 2) how does gene expression vary among organs and tissue types within a species, 3) how does gene expression vary among species which share common genes, 4) how does gene expression respond to various disease treatment regimes, 5) how does gene expression vary with progression of disease, etc.
  • an expression mining database may contain information concerning the expression of many genes or ESTs as collected from many different tissue samples. It would be useful to use this information to investigate questions such as, e.g., 1) which genes or ESTs are upregulated (expressed more) in diseased tissue and downregulated (expressed less) in disease tissue, 2) how does gene expression vary among organs and tissue types within a species, 3) how does gene expression vary among species which share common genes,
  • Expression mining database 124' may include duplicate representations of data in expression analysis database. Expression mining database 124' may also include various tables to facilitate mining operations conducted by a user who operates a querying and mining system 126'. Querying and mining system 126' includes a user interface that permits an operator to make queries to investigate expression of genes and ESTs and answer the types of questions identified above. An example of a querying and mining system is described in a commonly owned U.S. Patent Application No. 09/122,434, entitled GENE EXPRESSION AND EVALUATION SYSTEM, filed July_ 24, 1998.
  • Chip design system 104', analysis system 120' and control portions of exposure system 116', sample preparation system 114', and scanning system 118' may be appropriately programmed computers such as a Sun workstation or IBM-compatible PC.
  • An independent computer for each system may perform the computer-implemented functions of these systems or one computer may combine the computerized functions of two or more systems.
  • One or more computers may maintain expression analysis database 122', expression mining database 124', and querying and mining system 126' independent of the computers operating the systems of Fig. 1 '.
  • Fig. 2A' depicts a block diagram of a host computer system 10' suitable for implementing a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Fig. 2A' illustrates a host computer system 210' including a bus 212' which interconnects major subsystems such as a central processor 214', a system memory 216' (typically RAM), an input output (I/O) adapter 218', an external device such as a display screen 224' via a display adapter 226', a keyboard 232' and a mouse 234' via an I/O adapter 218', a SCSI host adapter 236', and a removable disk drive 238' operative to receive a removable disk 240'.
  • a bus 212' which interconnects major subsystems such as a central processor 214', a system memory 216' (typically RAM), an input output (I/O) adapter 218', an external device such as a display screen 224' via a display adapter 226', a keyboard 232' and a mouse 234' via an I/O adapter 218', a SCSI host adapter 236',
  • SCSI host adapter 236' may act as a storage interface to a fixed disk drive 242' or a CD-ROM player 244' operative to receive a CD-ROM 246'.
  • Fixed disk 244' may be a part of host computer system 210' or may be separate and accessed through other interface systems.
  • a network interface 248' may provide a direct connection to a remote server via a telephone link or to the Internet.
  • Network interface 248' may also connect to a local area network (LAN) or other network interconnecting many computer systems. Many other devices or subsystems (not shown) may be connected in a similar manner.
  • LAN local area network
  • Fig. 2A' it is not necessary for all of the devices shown in Fig. 2A' to be present to practice the present invention, as discussed below.
  • the devices and subsystems may be interconnected in different ways from that shown in Fig. 2A'.
  • the operation of a computer system such as that shown in Fig. 2A' is readily known in the art and is not discussed in detail in this application.
  • Code to implement the present invention may be operably disposed or stored in computer-readable storage media such as system memory 216', fixed disk 242 ' , CD-ROM 246 ' , or floppy disk 240 ' .
  • Fig. 2B' depicts a simplified diagram of a network 260' interconnecting multiple computer systems 210a'-210e'.
  • Network 260' may be a local area network (LAN), wide area network (WAN), etc.
  • Bioinformatics database 102' and the computer-related operations of the other elements of Fig. 2B' may be divided amongst computer systems 210' in any way with network 260' being used to communicate information among the various computers.
  • Portable storage media such as removable disks may be used to carry information between computers instead of network 260'.
  • FIG. 3 A' depicts a flowchart 301 ' of simplified process steps for managing information about a plurality of experiments conducted on a plurality of samples in a particular representative embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Each experiment can provide an indication of a degree of expression of particular genetic sequences in a sample.
  • a step 310' at least one of the plurality of samples is registered with a centralized database.
  • a step 312' a plurality of information about the plurality of samples is tracked. The result of step 312' is that the information about samples can be inco ⁇ orated into the database.
  • a plurality of information about the plurality of experiments is tracked. Changes to the experimental environment in the laboratory are reflected in the database by the function of step 314'.
  • a sample history is produced from the information in the database. The sample history describes the state of the plurality of samples.
  • the information about the plurality of experiments and the information about the plurality of samples is filtered according to one or more filters selected by a user to produce expression sequence information.
  • the expression sequence information resulting from the operation of the experiments in the laboratory can be published on a public database which can be accessed by a web based user interface or other means.
  • Fig. 3B' depicts a flowchart 303' of simplified process steps for viewing the results of a plurality of samples in another embodiment according to the present invention.
  • the results can be stored in one or more databases.
  • the user specifies a database to query.
  • a step 324' one or more queries is submitted to the database in order to form a result.
  • the result can be viewed by the user by means of a display.
  • the result can be filtered according to one or more user specified filters.
  • the filtered result can be placed into a graphical form.
  • Fig. 3C provides a representative flow chart 305' of simplified process steps for managing information about a plurality of experiments conducted on samples in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • step 330' the sample is registered with a database.
  • the experiment setup is performed.
  • step 334' aliquoting is performed.
  • step 336' RNA is extracted.
  • a polymerized chain reaction (PCR) is performed on the RNA in a step 338'.
  • a step 340' cRNA is labeled.
  • fragmentation is performed.
  • Hybridization is performed in a step 344'.
  • a step 346' scanning of the hybridized chip is performed.
  • a step 348' grid alignment is performed.
  • Cell average analysis is performed in a step 350'.
  • a step 352' probe array analysis is performed, and in a step 354' a composite analysis is performed.
  • Fig. 4A' illustrates a representative a database structure in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 4A' illustrates a client work station 401', which can be one of the workstations 210' of Fig. 2B', for example, that can be interconnected with one or more of a plurality of databases.
  • GATC database 403' contains a plurality of gene chip results in GATC format.
  • GATC format provides a standardized interface for gene chip data across multiple systems. Reference may be had to http://www.
  • Database 405' provides data mining information, and can include FAQs and preferences.
  • Database 407' comprises annotations, descriptions and URLs for gene information. Embodiments can include all of the above databases, or can comprise a subset of the databases, or still further can include other databases without departing from the scope of the claimed invention.
  • the database structure of Fig. 4A' can provide data management functions, data publishing functions, and integration with gene chip clients such as client 401 '.
  • Data management functions can comprise a Laboratory Information Management System (LIMS).
  • LIMS Laboratory Information Management System
  • Embodiments implementing LIMS according to the present invention can provide functions of data tracking, such as process inputs, process outputs and process environments.
  • Data security functions such as authentication, access permissions and privileges, can include separating owners having write access and user groups with read-only access.
  • Data sharing functions can provide for group access to data.
  • Data publishing and sharing can be facilitated by compliance with a standardized data format.
  • GATC format can be used. This standardized format provides cross-system access to gene chip data.
  • the database server can be an Internet server providing web browser access.
  • Embodiments can include scripting capability and can provide analyses functions at the server. Some embodiments can provide communications with the database application through web applications, such as browsers and the like, and gene chip interfaces.
  • Databases can be embodied in a server such as an SQL server, an ORACLE server and the like.
  • the database server can be resident on a number of platforms such as an ORACLE NT, UNIX and the like.
  • Fig. 4B' illustrates a data source selection window 409' having a plurality of data sources from which gene and experiment information can be obtained, searched, and manipulated in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • 4B' illustrates a plurality of different database formats including, but not limited to, MICROSOFT EXCEL files, text files, MICROSOFT ACCESS 97 Database, AlfaPublish, DataMininglnfo, Genelnfo, JetForm ASCII files, JetForm dBase, JfDbFetchDBF, JfSample, JetForm Filler Example, Forms Track, JetForm Excel, JetForm Excel 5, AFFYMETRIX, Publish_Static, GeneChipLIMS, EliPublish, GEData, and others.
  • Many embodiments according to the present invention can provide for automation of experimental data collection and analyses, as well as publication of results.
  • Many embodiments according to the present invention can provide expression analysis, sample registration and result publication for a plurality of experiments for a particular sample, as well as for a plurality of samples. Additionally, the methods and techniques of the present invention can automate the definition of user parameters for analyses and the like.
  • Fig. 5 A' illustrates a representative automation page in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 5 A' illustrates an automation page 501' having a sample information section 502' and an experiment information section 504' and a sample experiment probe array section 506'.
  • Sample information section 502' provides fields for entering data such as a sample name, a sample type, a project name and a description of the sample and any comments. Fields for entering other data can also be included in various embodiments of the present invention.
  • Experiment information section 504' includes fields for entering experiment name, a probe array image identifier, a probe array type and information about the probe array such as a lot number, an analysis set, a cell average set, as well as a target database for publishing results.
  • Section 506' provides a display for matching sample probe arrays, sample experiments and probe array identifier's. A presently preferable embodiment provides the capability to have multiple samples as well as the capability to have multiple experiments per sample.
  • Fig 5B' illustrates an automation results page 503' in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Automation results page 503' provides a display of a plurality of steps in the setup and execution of an experiment and a result for a particular sample for each of the steps. For example, as illustrated by Fig. 5B', a sample first step entitled, "sample demo past registration" has received a pass result.
  • Other steps can be included in various embodiments without departing from the scope of the claims of the present invention.
  • Fig. 5C illustrates a representative expression scan screen 505' in a particular embodiment according to the present invention. This diagram is merely an __ illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 5C illustrates information about a pending scan.
  • Screen 505' includes a hybridized expression probe array image identifier field 510', which users can use to select particular probe arrays for scanning.
  • a sample in experiment information field 512' provides information about the sample such as its name, a project, the type of sample, the user's identifier and the date, as well as information about the experiment.
  • Probe array information field 514' provides information about the probe array image such as the identifier, the array type and the lot number.
  • Hybridization information field 516' provides information about reagents and lot numbers.
  • a plurality of filter fields 518' provide the capability to filter sample projects, sample types and probe array types.
  • Fig. 6A' illustrates a representative sample registration screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6A' illustrates sample registration screen 601' having fields for entry of data that describe the sample. For example, screen 601 ' includes fields for entering a sample name 602', sample project, sample type, as well as comments and description fields.
  • An initial process entry point field 604' enables the user to select a particular point in the laboratory's processes as a starting point.
  • a registered samples field 606' provides a listing of samples that have been registered.
  • a sample information field 608' provides information about the various samples.
  • Fig. 6B' illustrates a plurality of screens before automating laboratory information management in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6B' illustrates screens 610' for performing experiment setup. Screens 612' provide for performing the aliquoting step. Screens 614' provide for performing RNA extraction. Screens 616' provide for performing RT PCR. Screens 618' provide for performing cRNA labeling and screens 620' provide for performing fragmentation. Other screens and different types or designs of screens can be used in various embodiments according to the present invention without departing from the scope of the claims herein.
  • Fig. 6C illustrates representative hybridization screens in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6C illustrates a screen 621 ' for controlling hybridization processes. Screen 621 ' comprises a pending hybridization fragmented expression vessel identifier field 622'. Such hybridization fragmented expression vessels contain samples that have been fragmented.
  • Sample and experiment information field 624' provides tracking information about samples and experiments in the hybridization process.
  • Pending scan fields 626' provide hybridized expression and probe array image identification information.
  • Fig. 6C also illustrates hybridization control screen 623' and hybridization control screen 625'.
  • FIG. 6D' illustrates grid alignment control screens in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6D' illustrates a grid alignment control screen 631 '.
  • Grid alignment control screen 631 ' comprises a pending grid alignment display area 632' as well as a completed grid alignment display area 634'.
  • a sample experiment information field fields 636' provide information about samples and experiments in the grid alignment process.
  • File type information field 638' provides identification information about the file type
  • a probe array information field 639' provides identification information about the probe array.
  • Fig. 6E' illustrates a representative cell average analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6E' illustrates screen 641 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, probe array types, user names, image data/ probe array type, cell average name, image data and cell data, algorithm and other parameters.
  • a results area 642' provides information for a particular image name, a cell name, a probe array type and various parameters. A results area provides a pass/fail indication for the particular experiment.
  • Fig. 6F' illustrates a representative probe array analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6F' illustrates screen 651 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, probe array types, user names, cell data/probe array type, probe array name, probe array data, algorithm and other parameters.
  • Fig. 6F' also illustrates a results area 652' having a cell name, a probe array name, a probe array type, a parameters area and a results area for providing a pass/fail indication.
  • Fig. 6G' illustrates a composite analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 6G' illustrates a screen 661 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, user names, sense/anti-sense probe array, composite name, composite data, algorithm and other parameters. Additionally, screen 661 ' provides a results area 662' for displaying a sense chip file name, anti-chip file name, composite file name, a parameters area and a results area for providing a pass/fail indication of results.
  • Simple history screen 681 ' provides a historical listing of processes which have completed with respect to a particular sample.
  • Fig. 7A' illustrates a representative expression analysis screen for working with sets in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Fig. 7 A' illustrates screen 701' having a plurality of fields including a probe array type field 710', a user name field 712', an algorithm field 714', cell average name field 716', parameter field 718', existing set name field 711', a create update set name field 713', and a results area 719'.
  • the results area provides fields for image name, cell name, probe array type, algorithm, set name and an area for indicating a pass/fail result for the expression analysis step.
  • Some embodiments can provide support for batch analysis of experimental results and user parameter sets.
  • Fig. 7B' illustrates a create set name screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 7B' illustrates a screen 703' having a probe array type field 720', a probe array types used field 722', an existing set names field 724', and an area for specifying scaling and normalizations for various chips.
  • Fig. IC illustrates an expression cell data analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. IC illustrates an expression cell data analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • IC illustrates screen 705' having a plurality of fields for describing filter parameters. Filtering can be performed on a number of fields such as the assay type, data type, probe array type, date; including month, day and year, sample project, experiment name, sample type, user name and others.
  • Figs. 8A'-8C illustrate representative Expression Data Mining Tool (EDMT) screens in a particular embodiment according to the present invention. These diagrams are merely illustrations and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 8 A' illustrates an EDMT screen 801 '. Screen 801 ' comprises a plurality of areas, such as an area 802' that provides information about filters. Filters can be applied to the experimental data to narrow down the field of data on which to mine.
  • a results area 804' provides results of the filter data.
  • a graphs area 806' provides a plurality of formats of graphs for viewing the data.
  • Fig. 8B' illustrates a filter area such as filter area 802' of Fig. 8 A' in a particular embodiment according to the present invention.
  • Fig. 8B' illustrates filter area 802' having fields for a project filter 812', a probe array filter 814', a sample-type filter 816', an operator filter 818', a sample name filter 820', an experiment filter 822' and an analysis filter 824'.
  • Fig. 8B' also illustrates a filter results field for illustrating the type of filters being applied to the data. Queries can be described using the filters of filter area 802'. In a presently preferable embodiment, a user can select the analyses to query and then select the ranges on the results.
  • Fig. 8C illustrates a results area such as results area 804' of Fig. 8A' in a particular embodiment according to the present invention.
  • Fig. SC illustrates results area 804' having an experimental results table 830 and query results table 832' and a pivot results table 834'.
  • Figs. 8D'-8G' illustrate representative graphs such as can be displayed in graph section 806' of Fig. 8A' in a particular embodiment according to the present invention. These diagrams are merely illustrations and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 8D' illustrates a scatter-type graph of experimental results. The scatter graph can graph any numeric result on a logarithmic or linear scale. Further, a presently preferable embodiment can provide the capability to have multiple analyses per axes. A description of the probe set is included on the right side of the graph. A hotlink to external databases can also be provided at least in the preferred embodiment according to the present invention. Other options such as filters, point sizes, colors and the like can be specified by the user.
  • Fig. 8E' illustrates a fold change graph that can be displayed in graph area 806' of Fig. 8A' in a particular embodiment according to the present invention.
  • Full change graph of Fig. 8E' can be provided using logarithmic or linear scales, the capability to provide a probe set description hotlinks to external data bases and recompute fold change can also be provided by particular embodiments according to the present invention. Further, users can specify options such as point sizes, colors and the like.
  • Fig. 8F' illustrates a representative bar graph such as can be displayed in graph area 806' of Fig. 8A' in a particular embodiment according to the present invention.
  • the bar graph of Fig. 8F' can graph any numeric result and embodiments can provide the capability to users to change options such as bar size, colors and the like.
  • Fig. 8G' illustrates a representative histogram graph such as can be displayed in graph area 806' of Fig. 8A'.
  • the histogram graph of Fig. 8G' provides the ability to histogram average differences to indicate various landmarks and can provide the user with the capability to specify options such as pin size, range, colors and the like.
  • Fig. 9A' illustrates a queries display screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize __ other variations, modifications, and alternatives.
  • Fig. 9A' illustrates name saved queries screen 901 ' having a display area for a plurality of filters.
  • FIG. 9B' illustrates an annotation screen 903' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Annotation screen 903' provides a mechanism for displaying information about a probe set. Annotations can include an annotation text, a type of the annotation as well as other useful information. Annotation types can be user defined in a preferred embodiment. A user name can also be specified and a date can be specified. Other information can be specified in some embodiments and not all of this information will be specified in some embodiments.
  • Fig. 9C illustrates an example of displaying a probe annotation such as was configured in annotation screen 903' of Fig. 9B' in a particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Fig. 9C illustrates a highlighted line of information 904' for which a corresponding probe annotation 906' is displayed.
  • the probe annotation can provide the name of the probe, a description and other useful information.
  • Fig. 9D' illustrates a query annotation screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 9D' illustrates query annotation screen 910' having fields to specify probe sets types, annotations, a user identifier, a date, and a description. Query annotations can provide the ability to specify multiple filters and can also provide the ability to update annotations.
  • Fig. 9E' illustrates a probe set description screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 9E' illustrates probe set description screen 912' having the name of a probe set and an associated description. __ These descriptions can also be displayed in the expression data mining tool screen 801 ' under the results section 804'.
  • Fig. 9F' illustrates a search screen for searching array descriptions in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 9F' illustrates search array descriptions screen 914' having an search field 916' for accepting input, and an output field 918' for displaying the probe sets which match the text entered in the input field for the description of the probe set.
  • Search array descriptions screen 914' provides users with the capability to search descriptions in the database. The user can define the search criteria using the input field and can add the results to various filters.
  • Fig. 10A' illustrates screens for searching external databases in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 10A' illustrates a probe set description dialog screen 1002' having a probe set name, a description and various annotations. The user can search using the probe set description dialog screen 1002' for information corresponding to the description in external databases. By selecting the entrez database in dialog screen 1002', a browser window 1004' is displayed. Browser window 1004' provides for browsing information about gene genetic expression sequences and the like in external databases such as the entrez database.
  • a URL can be associated with a particular probe set. Further, multiple URLs can be associated for a particular probe set and a browser window can be automatically activated by the system to display relevant information about a probe set from external databases.
  • Fig. 10B' illustrates a FAQ display selection screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 10B' illustrates a FAQ selection screen 1008' having a plurality of frequently used searches. A user can perform one of the searches by simply selecting the desired search.
  • a dialog screen 1010' can be displayed to the user upon selection of a particular FAQ. Dialog screen 1010' provides a plurality of questions that the user can answer in order to define the selected search.
  • FAQs can be stored in data mining information database 306'.
  • Fig. 10C illustrates a gene chip migration screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives.
  • Fig. 10C illustrates gene chip migration screen 1022' having a display area for local files in a plurality of formats 1024', a display area 1026' indicating data to migrate, a status area 1028' and a LIMS sample area 1030'.
  • the migration screen can be used to add gene chip data to the LIMS. In a preferred embodiment, it can facilitate association of information about samples, experiments, scan data and results.
  • FIG. 10D' illustrates fluidics station control screens 1031 ' and 1032' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fluidics control screens 1031' and 1032' can provide the user with the capability to control a fluidics station based upon selection of particular experiment names and protocols. The user can specify assay types, sample projects, reagents and protocols using the fluidics control screens.
  • Fig. 10E' illustrates a scanner control screens 1041' and 1042' for controlling the scanning to a local drive or to a network in particular embodiment according to the present invention.
  • This diagram is merely an illustration and should not limit the scope of the claims herein.
  • Scan control screens 1041' and 1042' provide the capability to the user to specify experiment name, probe array types, number of scans to be performed, assay-types, sample projects, experiments and a display of the scanned experiments.
  • Fig. 10F' illustrates experiment information screens 1051 ' and 1052' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Experiment information screens 1051' and 1052' provide the user with the capability to specify experiment names, probe array, probe array lots, operators, sample types, sample descriptions, projects, comments, reagents and reagent lots.
  • the present invention provides a method for mining experiment information for a patterns selectable by a user.
  • One advantage is that the method provides better access to genetic expression information than methods known in the prior art.
  • Another advantage provided by this approach is that the results of numerous experiments can be mined effectively using visualization techniques and set theory queries, for example.

Abstract

According to the invention, a system and method for organizing expression information in a way that facilitates data mining. A database model is provided which may organize information relating to, e.g., sample preparation, expression analysis of experiment results, and intermediate and final results of mining expression and concentration results (124). The model is readily translatable into database languages such as SQL and the like. THe database model scales to permit mining of expression or concentration information collected from large numbers of samples (126). Embodiments of the present invention can provide a computer based method for managing information about a plurality of experiments conducted on a plurality of samples and the like (112-118).

Description

METHOD AND APPARATUS FOR PROVIDING AN EXPRESSION DATA MINING DATABASE AND LABORATORY INFORMATION
MANAGEMENT
CROSS-REFERENCES TO RELATED APPLICATIONS This application claims priority from the following U.S. Provisional
Applications, the entire disclosure of which, including all appendices and all attached documents, is incoφorated by reference in its entirety for all purposes:
U.S. Provisional Patent Application No. 60/100,724 filed on September 17, 1998, entitled METHOD AND APPARATUS FOR PROVIDING A LABORATORY INFORMATION MANAGEMENT SYSTEM, (Attorney Docket Number 018547- 037500US); and
U.S. Provisional Patent Application No. 60/100,740 filed on September 17, 1998, entitled METHOD AND APPARATUS FOR PROVIDING AN EXPRESSION DATA MINING DATABASE, (Attorney Docket Number 018547-033840US). Furthermore, commonly owned, copending U.S. Patent Application No.
09/122,167, entitled METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE, filed on July 24, 1998; and
U.S. Patent Application No. 09/122,434, entitled GENE EXPRESSION AND EVALUATION SYSTEM, filed July 24, 1998 are herein incoφorated by reference.
STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT Research leading to portions of the present invention was funded by the Department of Commerce through the National Institute of Standards and Technology. BACKGROUND OF THE INVENTION The present invention relates to computer systems and more particularly to computer systems for mining and for managing laboratory operations about gene expression levels. Devices and computer systems have been developed for collecting information about gene expression or expressed sequence tags (EST) in large numbers of samples. For example, PCT application WO92/10588, incoφorated herein by reference for all puφoses, describes techniques for sequence checking nucleic acids and other materials. Probes for performing these operations may be formed in arrays according to the pioneering techniques disclosed in U.S. Patent No. 5,143,854 and U.S. Patent No. 5,571,639, for example. Both of these U.S. Patents are incoφorated herein by reference for all puφoses.
According to one aspect of the techniques described in these patents, an array of nucleic acid probes is fabricated at known locations on a chip or substrate. A fluorescent label attached to a nucleic acid is then brought into contact with the chip and a scanner generates an image file indicating the locations where the labeled nucleic acids bound to the chip. Based upon the identities of the probes at these locations, information such as the monomer sequence of DNA or RNA can be extracted.
Computer-aided techniques for gene expression monitoring using such arrays of probes have been developed as disclosed in EP Pub. No. 0848067 and PCT publication No. WO 97/10365, the contents of which are herein incoφorated by reference. Many diseases are characterized by differences in the degree that various genes are expressed either through changes in the copy number of the genetic DNA or through changes in levels of transcription (e.g., through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and gains of genetic material play an important role in malignant transformation and progression. Furthermore, changes in the expression (transcription) levels of particular genes (e.g., oncogenes or tumor suppressors), serve as signposts for the presence and progression of various cancers. Information on expression of genes or expressed sequence tags may be collected on a large scale in many ways, including the probe array techniques described above. One of the objectives in collecting this information is the identification of genes or ESTs whose expression is of particular importance. Researchers use such techniques to answer questions such as: 1) Which genes are expressed in cells of a malignant tumor but not expressed in either healthy tissue or tissue treated according to a particular regime? 2) Which genes or ESTs are expressed in particular organs but not in others? 3) Which genes or ESTs are expressed in particular species but not in others?
Collecting vast amounts of expression data from large numbers of samples including many tissue types is useful in answering these questions. However, in order to derive full benefit from the investment made in collecting and storing expression data, techniques enabling one to efficiently mine the data to find items of particular relevance are highly desirable.
SUMMARY OF THE INVENTION The present invention provides techniques for organizing expression or concentration information in a way that facilitates mining. A database model is provided which may organize information relating to, e.g., sample preparation, expression analysis of experiment results, and intermediate and final results of mining gene expression measurements, gene sets and the like. The model is readily translatable into database languages such as SQL and the like. The database model can scale to permit mining of gene expression measurements collected from large numbers of samples.
According to an embodiment of the present invention, a computer based method for mining a plurality of experiment information is provided. The method includes a variety of steps such as collecting information from experiments and chip designs. The method can include steps of selecting experiments to be mined. Experiment results and other information can be organized by experimental analysis, and the like. A step of defining one or more groupings for the experiments to be mined is also be part of the method. The method also includes a step of selecting based upon the groupings, information about the experiments to be mined to form a plurality of resulting information. This resulting information can include one or more resulting gene sets, and the like. Finally, the method formats the resulting information for viewing by a user. The combination of these steps can provide to the user the ability to access experiment information.
In some embodiments, visualization techniques can be used in conjunction with the steps of the method to enable users to more easily understand the results of the data mining. Further, in some embodiments, a step of recording conclusions about the results of the data mining can also be part of the method.
In another aspect according to the present invention, a method for working with expression information is provided. The method includes a variety of steps such as collecting information about results of experiments. A step of gathering information about samples and information about the experiments, which can comprise an experimental analysis and the like, is also part of the method. The step of adding one or more attributes to the information about the experiments can also be performed. The method then transforms the plurality of results of experiments into a plurality of transformed information. Transformations can include normalizing, de-normalizing, aggregation, scaling, and the like. Steps of mining the plurality of transformed information and visualizing the plurality of transformed information can also be part of the method.
The present invention provides techniques for improved monitoring of genetic expression or sequence analysis. More particularly, the present invention provides a method for managing laboratory operations for monitoring expression or performing sequence analysis.
According to an embodiment of the present invention, a computer based method for managing information about a plurality of experiments conducted on a plurality of samples is provided. Each experiment can provide an indication of the degree that particular genes are expressed in a sample. The method includes a variety of steps such as registering at least one of the plurality of samples with a centralized database. The method can include steps of tracking a plurality of information about the samples and tracking a plurality of information about the experiments. A step of producing a sample history about the plurality of samples from the plurality of information can also be a part of the method. The method can include filtering the information about the experiments and the information about the samples according to parameters selected by a user. The information can be made available for publishing to a variety of targets such as a public database. The combination of these steps can provide a web based user interface that can enable the user to access the information.
In many embodiments, the experimental result information can be entered in a format that can provide cross platform use and sharing of the information. One such format is Genetic Analysis Technology Consortium ("GATC"), a standard for genomic databases provided by Molecular Dynamics, of Hayward, CA, and Affymetrix, Inc., of Santa Clara, CA. Reference may be had to http://www.gatconsortium.org for further information about GATC. However, many embodiments can use other standard formats, such as those commonly known in the art. In another aspect according to the present invention, a method for viewing the results of a plurality of experiments which are stored in at least one database is provided. The method includes a variety of steps such as specifying a database to query. One or more queries can be submitted to form a result. The user can then view the result. The result may be filtered according to one or more user specified factors of interest in order to form a filtered result, which can be put into a graphical form, for example, for ease of viewing.
Numerous benefits are achieved by way of the present invention over conventional techniques. Some embodiments according to the present invention can provide better access to genetic experiment information than methods known in the prior art. In some embodiments, the present invention is more cost effective than conventional techniques. Embodiments can provide answers to queries such as, "show all genes where the gene expression value is greater than or equal to 100, where at least three genes out of four respond to the query," as well as answers to many other and varied useful queries. Another advantage provided by this approach is that the results of numerous experiments can be mined effectively using visualization techniques and set theory queries. Some embodiments according to the invention are less complex than known techniques. The present invention can also provide a graphical indication of laboratory analysis processes that is substantially clear for viewing.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 illustrates a representative system and process for forming and analyzing arrays of biological materials such as DNA or RNA in a particular embodiment according to the present invention.
Fig. 2A illustrates a computer system suitable for use in conjunction with the representative system of Fig. 1. Fig. 2B illustrates a computer network suitable for use in conjunction with the representative system of Fig. 1.
Fig. 3 illustrates an entity relationship diagram for inteφreting a database model. Figs. 4A-4F illustrate a database model for maintaining information for the system and method of Fig. 1 in a particular embodiment according to the present invention.
Figs. 5A-5B depict simplified flowcharts of representative process steps in select embodiments according to the invention. Figs. 6A-6F illustrate representative block flow diagrams in a particular embodiment according to the present invention.
Figs. 7A-7O illustrate representative user interface screens in a particular embodiment according to the present invention.
Fig. 1 ' illustrates an overall system and process for forming and analyzing arrays of biological materials such as DNA or RNA in a particular embodiment according to the present invention;
Figs. 2A'-2B' illustrate computer systems suitable for use in conjunction with the overall system of Fig. 1' in a particular embodiment according to the present invention; Figs. 3A'-3C illustrate simplified flowcharts of representative process steps according to particular embodiments according to the invention;
Figs. 4A'-4B' illustrate representative database structures and data formats in a particular embodiment according to the present invention;
Figs. 5A'-5C illustrate representative automation screens in a particular embodiment according to the present invention;
Figs. 6A'-6H' illustrate representative expression analysis screens in a particular embodiment according to the present invention;
Figs. 7A'-7C illustrate representative expression analysis screens for working with sets in a particular embodiment according to the present invention; Figs. 8A'-8G' illustrate representative expression data mining screens in a particular embodiment according to the present invention;
Figs. 9A'-9F' illustrate representative annotation screens in a particular embodiment according to the present invention; and Figs. 10A'-10F' illustrate representative function screens in a particular embodiment according to the present invention.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
One embodiment of the present invention operates in the context of a system for analyzing biological or other materials using arrays that themselves include probes that may be made of biological materials such as RNA or DNA. The VLSIPS™ and GeneChip™ technologies provide methods of making and using very large arrays of polymers, such as nucleic acids, on very small chips. Reference may be had to U.S.
Patent No. 5,143,854 and PCT Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby incoφorated by reference in its entirety for all puφoses. Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid). It should be understood that the probes need not be nucleic acid probes but may also be other polymers such as peptides. Peptide probes may be used to detect the concentration of peptides, polypeptides, or polymers in a sample. The probes should be carefully selected to have bonding affinity to the compound whose concentration they are to be used to measure. Fig. 1 illustrates a simplified diagram of a representative example system
100 for forming and analyzing arrays of biological materials such as RNA or DNA. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. A chip design system 104 is used to design arrays of polymers such as biological polymers such as RNA or DNA. Chip design system 104 may be, for example, an appropriately programmed Sun Workstation or personal computer or workstation, such as an IBM PC equivalent, and the like. Chip design system 104 obtains inputs from a user regarding chip design objectives including characteristics of genes of interest, and other inputs regarding the desired features of the array. Optionally, chip design system 104 may obtain information regarding a specific genetic sequence of interest from bioinformatics database 102 or from external databases such as GenBank. The output of chip design system 104 is a set of chip design computer files in the form of, for example, a switch matrix, as described in PCT application WO 92/10092, and other associated computer files. Systems for designing chips for sequencing, sequence checking and expression analysis are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 97/10365, the entire contents of which are herein incoφorated by reference for all purposes. The chip design files are input to a mask design system (not shown) that designs the lithographic masks used in the fabrication of arrays of molecules such as DNA. The mask design system designs the lithographic masks used in the fabrication of probe arrays. The mask design system generates mask design files that are then used by a mask construction system (not shown) to construct masks or other synthesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays.
The masks are used in a synthesis system (not shown). The synthesis system includes the necessary hardware and software used to fabricate arrays of polymers on a substrate or chip. The synthesis system includes a light source and a chemical flow cell on which the substrate or chip is placed. A mask is placed between the light source and the substrate/chip, and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. Selected chemical reagents are directed through the flow cell for coupling to deprotected regions, as well as for washing and other operations. The substrates fabricated by the synthesis system are optionally diced into smaller chips. The output of the synthesis system is a chip ready for application of a target sample. Information about the mask design, mask construction, and probe array synthesis systems is presented by way of background.
A biological source 112 is, for example, tissue from a plant or animal. Various processing steps are applied to material from biological source 112 by a sample preparation system 114. These steps may include isolation of mRNA, precipitation of the mRNA to increase concentration. The result of the various processing steps is a target sample ready for application to the chips produced by the synthesis system 110. Sample preparation methods for expression analysis are discussed in detail in WO97/10365. The prepared samples include monomer nucleotide sequences such as RNA or DNA. When the sample is applied to the chip by a sample exposure system 116, the nucleotides may or may not bond to the probes. The nucleotides have been tagged with fluorescein labels to determine which probes have bonded to nucleotide sequences from the sample. The prepared samples will be placed in a scanning system 118. Scanning system 118 includes a detection device such as a confocal microscope or CCD (charge-coupled device) that is used to detect the location where labeled receptors have bound to the substrate. The output of scanning system 118 is an image file(s) indicating, in the case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other related measurements, such as voltage) as a function of position on the substrate. Since higher photon counts will be observed where the labeled receptor has bound more strongly to the array of polymers, and since the monomer sequence of the polymers on the substrate is known as a function of position, it becomes possible to determine the sequence(s) of polymer(s) on the substrate that are complementary to the receptor.
The image files and the design of the chips are input to an analysis system 120 that, e.g., calls base sequences, or determines expression levels of genes or expressed sequence tags. The expression level of a gene or EST is herein understood to be the concentration within a sample of mRNA or protein that would result from the transcription of the gene or EST. Such analysis techniques are disclosed in WO97/10365 and U.S. App. No. 08/531,137, the entire contents of which are herein incoφorated by reference for all puφoses.
An expression analysis database 122 maintains information used to analyze expression and the results of expression analysis. Contents of expression analysis database 122 may include tables listing analyses performed, analysis results, experiments performed, sample preparation protocols and parameters of these protocols, chip designs, etc. Details of one embodiment of expression analysis database 122 are described in U.S. Patent App. No. 09/122,167, entitled METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE, filed on July 24, 1998, the entire contents of which are incoφorated herein by reference for all puφoses.
One or more instantiations of expression analysis database 122 may contain information concerning the expression of many genes or ESTs as collected from many different tissue samples. It would be useful to use this information to investigate questions such as, e.g., 1) which genes or ESTs are upregulated (expressed more) in diseased tissue and downregulated (expressed less) in disease tissue, 2) how does gene expression vary among organs and tissue types within a species, 3) how does gene expression vary among species which share common genes, 4) how does gene expression respond to various disease treatment regimes, 5) how does gene expression vary with progression of disease, etc. To facilitate investigations of this kind, an expression mining database 124 is provided. Expression mining database 124 may include duplicate representations of data in expression analysis database. Expression mining database 124 may also include various tables to facilitate mining operations conducted by a user who operates a querying and mining system 126. Querying and mining system 126 includes a user interface that permits an operator to make queries to investigate expression of genes and ESTs and answer the types of questions identified above. An example of a querying and mining system is described in U.S. Patent Application No. 09/122,434, entitled GENE EXPRESSION AND EVALUATION SYSTEM, filed July 24, 1998, the entire contents of which are incoφorated herein by reference for all puφoses.
Chip design system 104, analysis system 120 and control portions of exposure system 116, sample preparation system 114, and scanning system 118 may be appropriately programmed computers such as a Sun workstation or IBM-compatible PC. An independent computer for each system may perform the computer-implemented functions of these systems or one computer may combine the computerized functions of two or more systems. One or more computers may maintain expression analysis database 122, expression mining database 124, and querying and mining system 126 independent of the computers operating the systems of Fig. 1.
Fig. 2A depicts a simplified block diagram of a representative host computer system 10 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Host computer system 210 includes a bus 212 which interconnects major subsystems such as a central processor 214, a system memory 216 (typically RAM), an input/output (I/O) adapter 218, an external device such as a display screen 224 via a display adapter 226, a keyboard 232 and a mouse 234 via an I/O adapter 218, a SCSI host adapter 236, and a removable disk drive 238 operative to receive a removable disk 240. SCSI host adapter 236 may act as a storage interface to a fixed disk drive 242 or a CD- ROM player 244 operative to receive a CD-ROM 246. Fixed disk 244 may be a part of host computer system 210 or may be separate and accessed through other interface systems. A network interface 248 may provide a direct connection to a remote server via a telephone link or to the Internet. Network interface 248 may also connect to a local area network (LAN) or other network interconnecting many computer systems. Many other devices or subsystems (not shown) may be connected in a similar manner.
Also, it is not necessary for all of the devices shown in Fig. 2A to be present to practice the present invention, as discussed below. The devices and subsystems may be interconnected in different ways from that shown in Fig. 2A. The operation of a computer system such as that shown in Fig. 2 A is readily known in the art and is not discussed in detail in this application. Code to implement the present invention, may be operably disposed or stored in computer-readable storage media such as system memory 216, fixed disk 242, CD-ROM 246, or removable disk 240. Fig. 2B depicts a simplified diagram of a network 260 interconnecting multiple computer systems 210a-210e. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Network 260 may be a local area network (LAN), wide area network (WAN), etc. Bioinformatics database 102 and the computer-related operations of the other elements of Fig. 2B may be divided amongst computer systems 210 in any way with network 260 being used to communicate information among the various computers. Portable storage media such as removable disks may be used to carry information between computers instead of network 260. The contents and structure of expression mining database 124 in a particular representative example embodiment according to the present invention will now be described. Expression mining database 124 is preferably a multidimensional relational database with a complex internal structure. However, other types of databases can also be used in select embodiments without departing from the scope of the present invention. The structure and contents of expression mining database 124 will be described with reference to a model that describes the contents of tables of the database as well as interrelationships among the tables. A visual depiction of this model will be an Entity Relationship Diagram (ERD) which includes entities, relationships, and attributes. A detailed discussion of ERDs is found in "ERwin version 3.5.2 Methods Guide" available from Platinum Technologies, Inc., the contents of which are herein incoφorated by reference for all puφoses. Those of skill in the art will appreciate that automated tools such as ERwin and Developer 2000 available from Oracle will convert the ERD from Fig. 4A directly into executable code such as SQL code for creating and operating the database. Fig. 3 illustrates a key to ERDs that will be used to describe the contents of chip design database 102. Fig. 3 is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. A representative table 302 includes one or more key attributes 304 and one or more non-key attributes 306. Representative table 302 includes one or more records where each record includes fields corresponding to the listed attributes. The contents of the key fields taken together identify an individual record. In the ERD, each table is represented by a rectangle divided by a horizontal line. The fields or attributes above the line are key while the fields or attributes below the line are non- key attributes. An identifying relationship 308 signifies that the key attribute of a parent table 310 is also a part of a composite key attribute of a child table 312. A non- identifying relationship 314 signifies that the key attribute of a parent table 316 is also a non-key attribute of a child table 318. Foreign keys, denoted by (FK), comprise attributes of one table that are either a key or a part of a composite of another table. For both the non-identifying and the identifying relationship, one record in the parent table corresponds to one or more records in the child table.
Fig. 4A illustrates a simplified entity relationship diagram (ERD) of elements of expression mining database 124 in a particular embodiment according to the present invention. Fig. 4A is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Rectangles in Fig. 4A correspond to tables in expression mining database 124. For each rectangle, the title of the table is listed above the rectangle. Within each rectangle, columns of the table are listed. Above a horizontal line within each rectangle are listed key columns, columns whose contents are used to identify individual records in the table. Below this horizontal line are the names of non-key attributes. The lines between the rectangles identify the relationships between records of one table and records of another table. First, the relationships among the various tables will be described. Then, the contents of each table will be discussed in detail.
In operation, expression mining database 124 is updated during mining operations. Certain tables are updated by importation and transformation from expression analysis database 122. Certain other tables may be updated as an operator of querying and mining system 126 defines a query operation. It can be useful to identify genes or ESTs whose expression varies in some way depending on one or more tissue attributes. Therefore, it is necessary for querying and mining system 126 to have awareness of tissue attributes associated with expression analysis results. One or more analysis results are typically associated with what is herein referred to as "leaf target samples."
In order to provide a more easily understood explanation of the workings of the present invention, the relationship between "leaf target samples" and tissue attributes will first be discussed. A "raw sample" represents a piece of extracted tissue. Before further processing, a single raw sample may be cleaved into multiple raw samples. The raw samples are the input to sample preparation system 114. For each raw sample, sample preparation system 114 prepares a so-called "target" which is a fluid including mRNA or other expression indicator. A "target" may be split into multiple "replicates" and replicates may be pooled to form another target. The individual "targets" that are applied to chips are the leaf target samples. Each application of a "leaf target sample" to a chip represents an experiment. In a presently preferable embodiment, expression analyses can be conducted on experiment data according to one or more selectable criteria to produce experimental analysis result data.
The tables of expression mining database 124 that relate to samples and attributes are identified in Fig. 4A by the letter "A." Leaf target samples, raw samples, replicates, targets, etc. are listed in a sample item table 402. A sample item derivation table 404 lists transformations from one sample item to another. Sample item derivation table lists, e.g., splitting, pooling, and cleaving operations, transformations from raw samples to targets and analyses applied. A sample derivation type table 406 lists the various types of transformation. The various sample item types themselves, e.g., target, replicate, raw sample, leaf target sample, analyses and the like, are listed in a sample item type table 408. Listing the sample derivation types and sample item types allows easy reprogramming to accommodate changes in sample processing procedures.
Associated with samples are attributes. Some of the attributes are strings or values identifying concentrations, sample preparation dates, expiration dates, and the like. Other attributes identify characteristics that are highly useful in searching for genes or ESTs of interest such as the disease state of tissue, the organ, or species from which a sample is extracted. Attributes are listed in a sample item attribute table 410. A sample item attribute map table 412 implements a many-to-many relationship between sample item attribute table 410 and a sample item table 402. A sample may have more than one attribute, and an attribute can describe more than one sample item.
Each attribute has an associated attribute type listed in a sample item attribute type table 414 and an associated value for the attribute. Examples of attribute types are "concentration," "preparation date," "expiration date," etc. Another example of an attribute type would be "specimen type" where possible values would correspond to "tissue," organ culture," "purified cells," "primary cell culture," "established cell line ' and the like. Another example might be "ethnic group" where different values may correspond to "East Asian," Native American," for example. Many attribute types may be understood to derived from other attribute types. For example, the attribute type "ethnic group" may derived from an attribute type "human" which is in turn derived from an attribute "species." Some attribute types have no associated attributes but rather define levels of categorization. The derivations relating a "parent" attribute type to a "child" attribute type are listed in an attribute type derivation type table 418. Any attribute type may have one or more parents or children. Different types of derivation are listed in an attribute type derivation type table 420. One representative attribute type derivation type is category-subcategory where the parent type represents a category and the child type represents the subcategory. The availability of derivation relationships among attribute types greatly facilitate the formulation of useful queries to expression mining database 124, allowing the user to readily identify attribute types of interest.
Tables related to information about experiments are denoted in Fig. 4 A with the letter "B." An experiment table 424 lists experiments whose results are available for querying. A data map table 426 lists entries corresponding to sets of genes or ESTs to be investigated. Each set corresponds to a collection of experiments performed to investigate the genes in the set. An experiment set table 428 lists associations between experiments and entries in data map table 426 and thus defines the collection of experiments corresponding to each gene set. An analysis set table 430 defines sets of analyses that have been performed corresponding to each gene set. Each entry defines an association between an analysis, an experiment and an entry in data map table 426.
Tables related to information about genes are denoted in Fig. 4A with the letter "C." A gene set table 432 defines membership in all sets of genes that have defined by users to prepare for querying and mining operations or have been otherwise defined. A gene set name table 434 lists names for the gene sets. Genes belonging to gene sets are listed in a bio-item accession table 436. Each entry in bio-item accession table 436 identifies an accession number in a bio-item database. Definitions for accession numbers are stored in an accession definition table 438. A housekeeping genes table 440 lists genes with known expression level that are used to calibrate the expression monitoring process.
Tables related to analysis information are denoted with the letter "D." Absolute expression analysis results are stored in an absolute result table 444. Each entry in absolute result table 444 references an absolute result type. Different absolute result types may include e.g., present, marginal, absent, and unknown, indicating an estimate of the expression level of a given gene or EST. The various relative absolute result types are listed in an absolute result type table 446. Relative analysis results are stored in a relative analysis result table 448. Each entry in relative analysis result table 448 references a relative result type listed in a relative result type table 450. Relative analyses compare expression of a gene in two experiments. Different relative result types may include e.g., increased, no change, decreased, and unknown, all describing the change of expression. Tables 448 and 450 are imported from expression analysis database 122 and are read-only from the viewpoint of querying and mining system 126.
Querying and mining system 126 also performs various expression analysis operations. Results of these calculations are maintained in a calculated fields table 452.
Tables related to mining and querying operations are denoted with a letter "E." At any one time, a user considers data from a collection of experiments. A list of the sample items which were used for these experiments is stored in a selected sample item table 454. Selected sample item table 454 is typically much smaller than sample item table 402, which can make query operations faster.
Each entry in a criteria set table 456 identifies a set of criteria used to query a group selected by sample item or by attribute. Each entry in a criteria set experiment table 458 identifies a set of criteria applied to gene or EST expression levels of a particular sample item belonging to a group identified by reference to criteria set table 456. A criteria set experiment detail table 460 includes entries identifying values to be applied as criteria. A user of querying and mining system 126 does not have access to information about leaf target samples but rather only about their "parents 1 The expression data can be recorded concerning the leaf target samples. Entries in criteria set experiment table 458 can be associated with sample items in sample item table 458 and leaf target samples corresponding to these sample items by means of a criteria set experiment leaf table 462.
Various other tables can be included in embodiments according to the present invention and are denoted with a letter "F." A user preferences table 464 stores references to user preference files that record the preferences of individual users of querying and mining system 126. Users may wish to store functions used for normalization of expression data for later use. A normalization adjustment function table 466 lists information about normalization and other transformation functions. Users may wish to store functions used to average expression data collected from related replicates. Descriptions of these averaging functions are stored in a replicate average function table 468.
Fig. 5 A illustrates a flowchart 501 of simplified process steps in a particular representative embodiment according to the invention for mining a plurality of experiment information for a pattern. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. In a step 502, information from experiments and chip designs is collected. Then, in a step 504, experimental analyses to mine are selected. In a step 506, one or more sample attributions are defined. In a step 508, resulting information is determined from the experimental analyses by mining to form a plurality of resulting information. This resulting information can include one or more resulting gene sets. Finally, a step 510 formats the resulting information for viewing by a user. The combination of these steps can provide to the user the ability access experiment information.
Fig. 5B illustrates a flowchart 503 of simplified process steps in an alternative embodiment according to the invention for working with expression information. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. In a step 512, information about a plurality of results of a plurality of experimental analyses is collected. Then, in a step 514, information about samples and information about the plurality of experiments is gathered. Next, in a step 516, one or more attributes are added to the information about the experiments. Then, in a step 518, the plurality of results of experiments information is transformed to form a plurality of transformed information. Transformation can comprise normalization, denormalization, scaling, aggregation, and the like. Subsequently, in a step 520, the plurality of transformed information is mined. Then, in a step 522, the results of the mining are visualized for display to the user. Finally, in a step 524, conclusions are recorded.
Fig. 6A illustrates a representative block flow diagram of a simplified process steps in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Block flow diagram 601 includes an input data warehouse 602, a transformation step 604 to produce an output data mart 606 and a mining process step 608. Input data warehouse 602 can comprise a laboratory information management system and other databases. Data warehouse 602 in a particular embodiment can include genomic information and chip design information, as well as other useful information in the laboratory expression analysis process.
Fig. 6B illustrates a simplified block diagram of a representative data warehouse such as data warehouse 602 of Fig. 6A in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Data warehouse 602 comprises a laboratory information management system 610 and a plurality of published databases including published database 612. In one particular embodiment, a chip design component 614 can also be included in data warehouse 602. Yet further, genomic information component 616 can also be a part of data warehouse 602. In some embodiments, other reference databases 618 can also be part of data warehouse 602. Many embodiments can also include other information or may omit any of these particular components without departing from the scope of the present invention.
Data transformation step 604 of Fig. 6A can comprise in a particular embodiment according to the present invention a normalization and adjustment step. Normalization and adjustment can include functions tracked by analysis type and/or functional type. In some embodiments, a VBA function or independent applet can be added or removed. Additionally, in many embodiments, a user may selectively omit some transformations according to a preference. Data transformation step 604 can include a replicate step in which a user can manipulate replicates in ways similar to normalizations and adjustments. Further, in many embodiments a user can identify derivation-type replicates using a sample identification. Yet further, in some embodiments, custom selection of replicates can be embedded in an applet.
Fig. 6C illustrates a representative data mart such as data mart 606 of Fig. 6A in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Representative data mart 606 can comprise an experiment collection 620. Information and results of the experiment collection can be forwarded to an expression result 622. In many embodiments, a plurality of samples 624, which can have one or more sample attributes, can further have a relationship to expression result 622. A plurality of genes 626 can also be included in data mart 606. Finally, in a presently preferable embodiment, time may be treated as a dimension 628 of expression result 622. Other methods of organizing data in data marts can also be used without departing from the scope of the present invention. In a particular embodiment, experiments can be added to or removed from experiment collection 620. Further, in many embodiments, the same experiment collection can be mined for a plurality of purposes. Yet further, experiment collection 620 can be subdivided into one or more subsets of experiments to be mined. Fig. 6D illustrates a representative organization of samples and targets such as samples 624 of Fig. 6C in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Samples and targets can allow a user to describe stages of an experiment. At a top level is a raw sample. Fig 6D illustrates sample 624 that comprises a raw sample 630. Below the raw sample are one or more replicates. Two replicates, a replicate 632 and a replicate 634 comprise raw sample 630. Replicates can comprise targets. Replicate 632 is a target treated with a drug A. Replicate 634 is a target treated with drug B. One or more leaf targets can comprise a target. For example, leaf targets 636, 638, 640 and 642 comprise target 632. Leaf targets 644, 646, 648 and 650 comprise target 634. Experimental analyses can be associated with the leaf targets. Fig. 6D illustrates an experimental analysis 652 and an experimental analysis 654 associated with leaf target 632. In a presently preferable embodiment, experimental analyses can be recursively defined, i.e., an experimental analysis can comprise one or more experimental analyses. In a particular embodiment, intermediate levels can be defined by the user. Other levels can be included and other organizations may be used without departing from the scope of the claims of the present invention.
Fig. 6E illustrates another representative organization of samples and targets such as samples 624 of Fig. 6C in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modification and alternatives. Fig. 6E illustrates a raw sample 670 that represents a piece of extracted tissue, for example. Raw sample 670 has been cleaved into multiple raw samples, such as raw samples 672, 673 and 674. The raw samples are the input to sample preparation system 1 14 of Fig. 1. Sample preparation system 114 prepares targets, such as target 676 corresponding to raw sample 672. The target can be a fluid including mRNA or other expression indicator. Target 672 has been split into multiple replicates, such as replicates 677, 678 and 679. Replicates 678 and 680 have been pooled to form another target, target 682. The individual "targets" that are applied to chips are the leaf target samples. Each application of a "leaf target sample" to a chip represents an experiment. Leaf target sample 684 is an example. In a presently preferable embodiment, one or more experimental analyses can be associated with a particular leaf target sample. Here, analyses 686 and 688 are associated with leaf target sample 684. Further, an experimental analyses can be defined in terms of one or more other experimental analyses.
Fig. 6F illustrates a representative organization of a plurality of attributes such as attribute 628 of Fig. 6C in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modification and alternatives. Fig. 6F illustrates a plurality of attributes having a non- hierarchical structure. In a presently preferable embodiment, an unlimited number of attributes can be assigned to any particular sample. Yet further, different samples can have the same attributes. Fig. 6F illustrates an organism species 660 having a relationship with a plurality of attributes such as human attribute 662, mouse attribute 664, corn attribute 666 and yeast attribute 668. The "strain" and "race" windows are examples of attributes. Other arrangements and attributes can be used in various embodiment without departing from the scope of the claims of the present invention.
In some embodiments, genes 626 of Fig. 6C can be combined into one or more gene sets. Gene sets can be described by various users and in at least one particular embodiment are not shared among users, but can be shared by users in other embodiments. A user can copy other users' gene sets and can edit or delete gene sets. In a presently preferable embodiment, gene sets can be created or saved during mining of the data mart. In some embodiments, one or more functional operations, such as logical operations like union and intersection, arithmatic operations, such as additions, subtractions, scaling, and the like, can be applied to gene sets.
Fig. 7A illustrates a representative experiment collection screen 701 of a user interface in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Screen 701 enables a user to interact with an experiment collection comprised in expression mining database 124 of Fig. 1. Screen 701 comprises an experiment collection selection tab 702 shown with four experiment collections, such as experiment collections 704 and 706. Other experiment collections can be added as needed. Other formats can also be used for presenting this information to a user in various embodiments according to the present invention.
Fig. 7B illustrates an experiment selection screen 703 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Experiment selection screen 703 comprises an experiment tab 730. A plurality of experiments is indicated in two scrolling windows, an experiments selected window 734 and an experiments available window 736. Selection buttons 738a and 738b enable various experiments to be moved between experiment scrolling selection window 734 and 736. Experiment selection window 736 includes a plurality of experiments. One or more filters may be applied to the experiment data to limit the number of experiments depicted in experiment scrolling selection windows 734 and 736 using the filter mechanism at the bottom of the screen. The filter mechanism 744 comprises a column selection field 746 and a selection value input field 748. A user may select a particular field for which to screen experiments using column selection field 746 and then enter a desired value in value input field 748. Then, by clicking filter button 750, the user can apply the filter to the experiments in the collection so that only experiments in which the column is set to the selected value will be detected in experiment selection scroll windows 734 and 736.
Fig. 7C illustrates a selected experiment collection screen 705 having an analysis tab 751 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Screen 705 comprises two scrolling selection windows, an analyses selected window 752 and an analyses available window 754. Selection keys 756 and 758 may be used to move various analyses between scrolling selection windows 752 and 754. Similarly, a filter mechanism provided at the bottom of screen 705 enables a user to screen the analyses depicted in scrolling selection windows 752 and 754 by selecting a particular column using column selection field 760 and inputting a desired value into value input field 762 and then clicking filter button 764 to apply the filter to the analyses in the experiment collection. Fig. 7D illustrates a representative sample selection screen 707 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Screen 707 enables a user to view the results of selections made on one or more samples. Screen 707 comprises a plurality of selections including a sample selection 770, a sample-type selection 771 and an attribute-type selector 772. A previous/next button pair 774 and a select button 775 enable searching and selecting, respectively.
Fig. 7E illustrates a representative sample and attribute management screen 709 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Using screen 709, users can add, delete or rename samples, attributes, sample and attribute types and relationships between any of these. Screen 709 comprises a samples and attributes section 722 and a relationships section 724. Item selection window 776 of sample and attribute section 722 provides functions that enable the user to select the type of new item, sample, attribute, and the like. Function buttons 777 enable the user to select operations such as add new, rename, delete and the like. If the user elects to create a new item, then screen 711 of Fig. 7F is displayed. Screen 711 enables the user to create new items. The user can enter a name for the item in a new item field 780 and an item type in item type field 784 of screen 711. Otherwise, the user can work with relationships using the relationship section 724 in screen 709 of Fig. 7E.
Relationship selection window 778 of relationship section 724 enables the user to select the type of relationship, such as a relationship between sample item to sample item, a relationship between attribute and sample item or a relationship between attribute type to attribute type, for example. Function buttons 779 enable the user to select operations such as add new, delete and the like. If the user elects to create a new relationship, then screen 713 of Fig. 7F is displayed. Screen 713 enables the user to create new relationships. The user can enter a source of the relationship in a source window 782, a parent in the parent window 786 and a type of relationship in the derivation type window 788.
Fig. 7G illustrates a representative data mining option management screen 715 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Screen 715 illustrates a plurality of tabs, including a queries and charts tab 790, a patterns tab 792 and a gene set comparison tab 794. The user can specify some grouping parameters using the group by functions of queries and charts tab 790 in order to begin data mining.
Fig. 7H illustrates an experiment mining screen 717 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Using the functions of screen 717, users can enter criteria for data selection, such as for example, what gene sets to use and the like. Screen 717 includes a plurality of sample items, such as sample item 796. A group selection field 798 enables a user to select from a plurality of groups in the experiment collection. One or more gene sets can be selected using the gene set selection field 800. Gene sets can be all genes represented by a particular gene chip, or a subset. A default of all gene sets on a particular gene chip is provided in one particular embodiment, but other defaults can be used. A presence measure of the gene expression within the group can be specified using the expression percentage field 802. When the user has specified the search parameters using these fields, depressing the execute button 801 starts the data mining.
Fig. 71 illustrates a selected data screen 719 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Data selection screen 719 illustrates the data that meet the criteria specified by the user in experiment mining screen 717 of Fig. 7H. Data selection screen 719 illustrates a plurality of leaf parents, including leaf parent 804. Screen 719 also illustrates experiment replications 805, bio items 806 and results measured 807 during the experiment for each leaf parent. Users can export the results of the mining using export button 808 and/or can save the results of the mining using save gene set button 809.
Fig. 7J illustrates a bar chart visualization screen 721 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Scatter pot selection visualization screen 721 comprises a display area having a display of the data in the experiment collection. A quantity to be visualized can be selected from select value field 814. Experimental results 810 and 812 indicate differences in expression for a particular gene for the quantity selected by the user with field 814. Fig. 7K illustrates a scatter plot visualization screen 723 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Scatter pot selection visualization screen 723 comprises a display area 819 having a display of the data in the experiment collection. While display area 819 illustrates an X-Y plot, other forms of data visualization, such as bar charts, graphs, pie-charts and the like, are contemplated by various embodiments according to the present invention. Fig. 7L illustrates pattern search screens 725 and 727 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Gene pattern searching enables the user to determine relationships such as which genes behave similarly when exposed to a certain drug, and the like. Selecting the "pattern" tab on screen 725 displays information entry devices for entering search criteria, including a gene patterns field 820. By specifying search on gene patterns, the user can be presented with gene pattern search screen 727. The user can select a plurality of gene sets to compare using gene set name fields 822 and 824. A measurement selection field 826 enables the user to select a measurement of interest as a basis of the comparison.
Fig. 7M illustrates gene set comparison screens 729 and 731 in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Gene set comparisons enable the user to determine relationships such as which gene sets include particular genes, exclude particular genes, or functional combinations of genes. Selecting the "gene set comparison" tab on screen 729 displays information about gene sets that can be selected by the user for comparison. Screen 729 illustrates a plurality of gene sets, including gene sets 830, 832 and 834. After specifying gene sets of interest, the user can be presented with gene comparison screen 731. The user can select a plurality of genes as bases of comparing the gene sets selected in screen 720 by checking one or more of a plurality of selection windows, such as selection windows 836 and 838.
Figs. 7N-7O illustrate sample data management screens 733 and 735 in particular embodiments according to the present invention. These diagrams are merely illustrations and should not limit the scope of the present invention. One of ordinary skill in the art would recognize other variations, modifications and alternatives. Fig. 7N illustrates gene set management screen 733. This screen enables the user to perform a variety of tasks with genes and gene sets, such as add, remove, create and copy gene sets, and add and remove genes within gene sets, and the like. Fig. 7O illustrates update gene set screen 735. This screen enables the user to specify one or more genes to be removed from the database. LABORATORY INFORMATION MANAGEMENT
One embodiment of the present invention operates in the context of a system for analyzing biological or other materials using arrays that themselves include probes that may be made of biological materials such as RNA or DNA. The VLSIPS™ and GeneChip™ technologies provide methods of making and using very large arrays of polymers, such as nucleic acids, on very small chips. See U.S. Patent No. 5,143,854 and PCT Patent Publication Nos. WO 90/15070 and 92/10092, each of which is hereby incoφorated by reference for all puφoses. Nucleic acid probes on the chip are used to detect complementary nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid).
It should be understood that the probes need not be nucleic acid probes but may also be other polymers such as peptides. Peptide probes may be used to detect the concentration of peptides, polypeptides, or polymers in a sample. The probes should be carefully selected to have bonding affinity to the compound whose concentration they are to be used to measure.
Fig. 1 ' illustrates an overall system 100' for forming and analyzing arrays of biological materials such as RNA or DNA. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. A chip design system 104' is used to design arrays of polymers such as biological polymers such as RNA or DNA. Chip design system 104' may be, for example, an appropriately programmed Sun Workstation or personal computer or workstation, such as an IBM PC equivalent, including appropriate memory and a CPU. Chip design system 104' obtains inputs from a user regarding chip design objectives including characteristics of genes of interest, and other inputs regarding the desired features of the array. Optionally, chip design system 104' may obtain information regarding a specific genetic sequence of interest from bioinformatics database 102' or from external databases such as GenBank. The output of chip design system 104' is a set of chip design computer files in the form of, for example, a switch matrix, as described in PCT application WO 92/10092, and other associated computer files. Systems for designing chips for sequence determination and expression analysis are disclosed in U.S. Patent No. 5,571,639 and in PCT application WO 97/10365, the contents of which are herein incoφorated by reference. The chip design files are input to a mask design system (not shown) that designs the lithographic masks used in the fabrication of arrays of molecules such as DNA. The mask design system designs the lithographic masks used in the fabrication of probe arrays. The mask design system generates mask design files that are then used by a mask construction system (not shown) to construct masks or other synthesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays.
The masks are used in a synthesis system (not shown). The synthesis system includes the necessary hardware and software used to fabricate arrays of polymers on a substrate or chip. The synthesis system includes a light source and a chemical flow cell on which the substrate or chip is placed. A mask is placed between the light source and the substrate/chip, and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. Selected chemical reagents are directed through the flow cell for coupling to deprotected regions, as well as for washing and other operations. The substrates fabricated by the synthesis system are optionally diced into smaller chips. The output of the synthesis system is a chip ready for application of a target sample. Information about the mask design, mask construction, and probe array synthesis systems is presented by way of background.
A biological source 112' is, for example, tissue from a plant or animal. Various processing steps are applied to material from biological source 112' by a sample preparation system 114'. These steps may include isolation of mRNA, precipitation of the mRNA to increase concentration. The result of the various processing steps is a target sample ready for application to the chips produced by the synthesis system 110'. Sample preparation methods for expression analysis are discussed in detail in WO97/10365. The prepared samples include nucleic acid sequences such as RNA or DNA. When the sample is applied to the chip by a sample exposure system 116', the nucleic acids in the sample may or may not bond to the probes. The nucleic acids have been tagged with fluorescein labels to determine which probes have bonded to nucleic acid sequences from the sample. The prepared samples will be placed in a scanning system 118'. Scanning system 118' includes a detection device such as a confocal microscope or CCD (charge-coupled device) that is used to detect the location where labeled receptors have bound to the substrate. The output of scanning system 118' is an image file(s) indicating, in the case of fluorescein labeled receptor, the fluorescence intensity (photon counts or other related measurements, such as voltage) as a function of position on the substrate. Since higher photon counts will be observed where the labeled target has bound more strongly to the array of polymers, and since the monomer sequence of the polymers on the substrate is known as a function of position, it becomes possible to determine the sequence(s) of the target on the substrate that are complementary to the probes.
The image files and the design of the chips are input to an analysis system 120' that, e.g., calls base sequences, or determines expression levels of genes or expressed sequence tags. The expression level of a gene or EST is herein understood to be the concentration within a sample of mRNA or protein that would result from the transcription of the gene or EST. Such analysis techniques are disclosed in WO97/10365 and U.S. App. No. 08/531,137, the contents of which are herein incoφorated by reference.
An expression analysis database 122' maintains information used to analyze expression and the results of expression analysis. Contents of expression analysis database 122' may include tables listing analyses performed, analysis results, experiments performed, sample preparation protocols and parameters of these protocols, chip designs, etc. Details of one embodiment of expression analysis database 122' are described in U.S. Patent Application No. 09/122,167, entitled METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS DATABASE, filed on July 24, 1998, the contents of which are incoφorated herein by reference for all puφoses.
One or more instantiations of expression analysis database 122' may contain information concerning the expression of many genes or ESTs as collected from many different tissue samples. It would be useful to use this information to investigate questions such as, e.g., 1) which genes or ESTs are upregulated (expressed more) in diseased tissue and downregulated (expressed less) in disease tissue, 2) how does gene expression vary among organs and tissue types within a species, 3) how does gene expression vary among species which share common genes, 4) how does gene expression respond to various disease treatment regimes, 5) how does gene expression vary with progression of disease, etc. To facilitate investigations of this kind, an expression mining database
124' is provided. Expression mining database 124' may include duplicate representations of data in expression analysis database. Expression mining database 124' may also include various tables to facilitate mining operations conducted by a user who operates a querying and mining system 126'. Querying and mining system 126' includes a user interface that permits an operator to make queries to investigate expression of genes and ESTs and answer the types of questions identified above. An example of a querying and mining system is described in a commonly owned U.S. Patent Application No. 09/122,434, entitled GENE EXPRESSION AND EVALUATION SYSTEM, filed July_ 24, 1998.
Chip design system 104', analysis system 120' and control portions of exposure system 116', sample preparation system 114', and scanning system 118' may be appropriately programmed computers such as a Sun workstation or IBM-compatible PC. An independent computer for each system may perform the computer-implemented functions of these systems or one computer may combine the computerized functions of two or more systems. One or more computers may maintain expression analysis database 122', expression mining database 124', and querying and mining system 126' independent of the computers operating the systems of Fig. 1 '. Fig. 2A' depicts a block diagram of a host computer system 10' suitable for implementing a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 2A' illustrates a host computer system 210' including a bus 212' which interconnects major subsystems such as a central processor 214', a system memory 216' (typically RAM), an input output (I/O) adapter 218', an external device such as a display screen 224' via a display adapter 226', a keyboard 232' and a mouse 234' via an I/O adapter 218', a SCSI host adapter 236', and a removable disk drive 238' operative to receive a removable disk 240'. SCSI host adapter 236' may act as a storage interface to a fixed disk drive 242' or a CD-ROM player 244' operative to receive a CD-ROM 246'. Fixed disk 244' may be a part of host computer system 210' or may be separate and accessed through other interface systems. A network interface 248' may provide a direct connection to a remote server via a telephone link or to the Internet. Network interface 248' may also connect to a local area network (LAN) or other network interconnecting many computer systems. Many other devices or subsystems (not shown) may be connected in a similar manner.
Also, it is not necessary for all of the devices shown in Fig. 2A' to be present to practice the present invention, as discussed below. The devices and subsystems may be interconnected in different ways from that shown in Fig. 2A'. The operation of a computer system such as that shown in Fig. 2A' is readily known in the art and is not discussed in detail in this application. Code to implement the present invention, may be operably disposed or stored in computer-readable storage media such as system memory 216', fixed disk 242 ' , CD-ROM 246 ' , or floppy disk 240 ' .
Fig. 2B' depicts a simplified diagram of a network 260' interconnecting multiple computer systems 210a'-210e'. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Network 260' may be a local area network (LAN), wide area network (WAN), etc. Bioinformatics database 102' and the computer-related operations of the other elements of Fig. 2B' may be divided amongst computer systems 210' in any way with network 260' being used to communicate information among the various computers. Portable storage media such as removable disks may be used to carry information between computers instead of network 260'. Fig. 3 A' depicts a flowchart 301 ' of simplified process steps for managing information about a plurality of experiments conducted on a plurality of samples in a particular representative embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Each experiment can provide an indication of a degree of expression of particular genetic sequences in a sample. In a step 310', at least one of the plurality of samples is registered with a centralized database. Next, in a step 312', a plurality of information about the plurality of samples is tracked. The result of step 312' is that the information about samples can be incoφorated into the database. Then, in a step 314', a plurality of information about the plurality of experiments is tracked. Changes to the experimental environment in the laboratory are reflected in the database by the function of step 314'. Now, in a step 316', a sample history is produced from the information in the database. The sample history describes the state of the plurality of samples. In a step 318', the information about the plurality of experiments and the information about the plurality of samples is filtered according to one or more filters selected by a user to produce expression sequence information. Finally, in an optional step 320', the expression sequence information resulting from the operation of the experiments in the laboratory can be published on a public database which can be accessed by a web based user interface or other means.
Fig. 3B' depicts a flowchart 303' of simplified process steps for viewing the results of a plurality of samples in another embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the __ claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. The results can be stored in one or more databases. In a step 322', the user specifies a database to query. Next, in a step 324', one or more queries is submitted to the database in order to form a result. Then, in a step 326', the result can be viewed by the user by means of a display. In a step 328', the result can be filtered according to one or more user specified filters. Finally, in a step 330', the filtered result can be placed into a graphical form.
Fig. 3C provides a representative flow chart 305' of simplified process steps for managing information about a plurality of experiments conducted on samples in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. In step 330', the sample is registered with a database. Then, in a step 332' the experiment setup is performed. In a step 334' aliquoting is performed. Then, in step 336' RNA is extracted. A polymerized chain reaction (PCR) is performed on the RNA in a step 338'. In a step 340' cRNA is labeled. In a step 342', fragmentation is performed. Hybridization is performed in a step 344'. In a step 346', scanning of the hybridized chip is performed. Then in a step 348', grid alignment is performed. Cell average analysis is performed in a step 350'. In a step 352', probe array analysis is performed, and in a step 354' a composite analysis is performed.
Fig. 4A' illustrates a representative a database structure in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 4A' illustrates a client work station 401', which can be one of the workstations 210' of Fig. 2B', for example, that can be interconnected with one or more of a plurality of databases. For example, GATC database 403' contains a plurality of gene chip results in GATC format. GATC format provides a standardized interface for gene chip data across multiple systems. Reference may be had to http://www. gatconsortium.org for documents entitled, "Software Specifications" and "Database Schema," incoφorated herein by reference in its entirety for all puφoses, for further information about GATC. Database 405' provides data mining information, and can include FAQs and preferences. Database 407' comprises annotations, descriptions and URLs for gene information. Embodiments can include all of the above databases, or can comprise a subset of the databases, or still further can include other databases without departing from the scope of the claimed invention.
The database structure of Fig. 4A' can provide data management functions, data publishing functions, and integration with gene chip clients such as client 401 '. Data management functions can comprise a Laboratory Information Management System (LIMS). Embodiments implementing LIMS according to the present invention can provide functions of data tracking, such as process inputs, process outputs and process environments. Data security functions such as authentication, access permissions and privileges, can include separating owners having write access and user groups with read-only access. Data sharing functions can provide for group access to data. Data publishing and sharing can be facilitated by compliance with a standardized data format. In a presently preferred embodiment, GATC format can be used. This standardized format provides cross-system access to gene chip data. In a preferable embodiment, the database server can be an Internet server providing web browser access. Embodiments can include scripting capability and can provide analyses functions at the server. Some embodiments can provide communications with the database application through web applications, such as browsers and the like, and gene chip interfaces. Databases can be embodied in a server such as an SQL server, an ORACLE server and the like. The database server can be resident on a number of platforms such as an ORACLE NT, UNIX and the like.
Fig. 4B' illustrates a data source selection window 409' having a plurality of data sources from which gene and experiment information can be obtained, searched, and manipulated in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 4B' illustrates a plurality of different database formats including, but not limited to, MICROSOFT EXCEL files, text files, MICROSOFT ACCESS 97 Database, AlfaPublish, DataMininglnfo, Genelnfo, JetForm ASCII files, JetForm dBase, JfDbFetchDBF, JfSample, JetForm Filler Example, Forms Track, JetForm Excel, JetForm Excel 5, AFFYMETRIX, Publish_Static, GeneChipLIMS, EliPublish, GEData, and others. Many embodiments according to the present invention can provide for automation of experimental data collection and analyses, as well as publication of results. Many embodiments according to the present invention can provide expression analysis, sample registration and result publication for a plurality of experiments for a particular sample, as well as for a plurality of samples. Additionally, the methods and techniques of the present invention can automate the definition of user parameters for analyses and the like.
Fig. 5 A' illustrates a representative automation page in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 5 A' illustrates an automation page 501' having a sample information section 502' and an experiment information section 504' and a sample experiment probe array section 506'. Sample information section 502' provides fields for entering data such as a sample name, a sample type, a project name and a description of the sample and any comments. Fields for entering other data can also be included in various embodiments of the present invention. Experiment information section 504' includes fields for entering experiment name, a probe array image identifier, a probe array type and information about the probe array such as a lot number, an analysis set, a cell average set, as well as a target database for publishing results. Section 506' provides a display for matching sample probe arrays, sample experiments and probe array identifier's. A presently preferable embodiment provides the capability to have multiple samples as well as the capability to have multiple experiments per sample.
Fig 5B' illustrates an automation results page 503' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Automation results page 503' provides a display of a plurality of steps in the setup and execution of an experiment and a result for a particular sample for each of the steps. For example, as illustrated by Fig. 5B', a sample first step entitled, "sample demo past registration" has received a pass result. Other steps can be included in various embodiments without departing from the scope of the claims of the present invention.
Fig. 5C illustrates a representative expression scan screen 505' in a particular embodiment according to the present invention. This diagram is merely an __ illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 5C illustrates information about a pending scan. Screen 505' includes a hybridized expression probe array image identifier field 510', which users can use to select particular probe arrays for scanning. A sample in experiment information field 512' provides information about the sample such as its name, a project, the type of sample, the user's identifier and the date, as well as information about the experiment. Probe array information field 514' provides information about the probe array image such as the identifier, the array type and the lot number. Hybridization information field 516' provides information about reagents and lot numbers. A plurality of filter fields 518' provide the capability to filter sample projects, sample types and probe array types.
Fig. 6A' illustrates a representative sample registration screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6A' illustrates sample registration screen 601' having fields for entry of data that describe the sample. For example, screen 601 ' includes fields for entering a sample name 602', sample project, sample type, as well as comments and description fields. An initial process entry point field 604' enables the user to select a particular point in the laboratory's processes as a starting point. A registered samples field 606' provides a listing of samples that have been registered. A sample information field 608' provides information about the various samples.
Fig. 6B' illustrates a plurality of screens before automating laboratory information management in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6B' illustrates screens 610' for performing experiment setup. Screens 612' provide for performing the aliquoting step. Screens 614' provide for performing RNA extraction. Screens 616' provide for performing RT PCR. Screens 618' provide for performing cRNA labeling and screens 620' provide for performing fragmentation. Other screens and different types or designs of screens can be used in various embodiments according to the present invention without departing from the scope of the claims herein.
Fig. 6C illustrates representative hybridization screens in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6C illustrates a screen 621 ' for controlling hybridization processes. Screen 621 ' comprises a pending hybridization fragmented expression vessel identifier field 622'. Such hybridization fragmented expression vessels contain samples that have been fragmented. Sample and experiment information field 624' provides tracking information about samples and experiments in the hybridization process. Pending scan fields 626' provide hybridized expression and probe array image identification information. Fig. 6C also illustrates hybridization control screen 623' and hybridization control screen 625'. Screen 623' provides information about an experiment waiting to undergo the hybridization step. Screen 625' provides information about an experiment that has completed the hybridization step. Fig. 6D' illustrates grid alignment control screens in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6D' illustrates a grid alignment control screen 631 '. Grid alignment control screen 631 ' comprises a pending grid alignment display area 632' as well as a completed grid alignment display area 634'. A sample experiment information field fields 636' provide information about samples and experiments in the grid alignment process. File type information field 638' provides identification information about the file type, and a probe array information field 639' provides identification information about the probe array. Fig. 6E' illustrates a representative cell average analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6E' illustrates screen 641 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, probe array types, user names, image data/ probe array type, cell average name, image data and cell data, algorithm and other parameters. Further, a results area 642' provides information for a particular image name, a cell name, a probe array type and various parameters. A results area provides a pass/fail indication for the particular experiment.
Fig. 6F' illustrates a representative probe array analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6F' illustrates screen 651 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, probe array types, user names, cell data/probe array type, probe array name, probe array data, algorithm and other parameters. Fig. 6F' also illustrates a results area 652' having a cell name, a probe array name, a probe array type, a parameters area and a results area for providing a pass/fail indication.
Fig. 6G' illustrates a composite analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 6G' illustrates a screen 661 ' having a plurality of fields for entering information about sample projects, experiment names, sample types, user names, sense/anti-sense probe array, composite name, composite data, algorithm and other parameters. Additionally, screen 661 ' provides a results area 662' for displaying a sense chip file name, anti-chip file name, composite file name, a parameters area and a results area for providing a pass/fail indication of results. Fig. 6H' provides a representative sample history screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Simple history screen 681 ' provides a historical listing of processes which have completed with respect to a particular sample.
Fig. 7A' illustrates a representative expression analysis screen for working with sets in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 7 A' illustrates screen 701' having a plurality of fields including a probe array type field 710', a user name field 712', an algorithm field 714', cell average name field 716', parameter field 718', existing set name field 711', a create update set name field 713', and a results area 719'. The results area provides fields for image name, cell name, probe array type, algorithm, set name and an area for indicating a pass/fail result for the expression analysis step. Some embodiments can provide support for batch analysis of experimental results and user parameter sets.
Fig. 7B' illustrates a create set name screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 7B' illustrates a screen 703' having a probe array type field 720', a probe array types used field 722', an existing set names field 724', and an area for specifying scaling and normalizations for various chips. Fig. IC illustrates an expression cell data analysis screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. IC illustrates screen 705' having a plurality of fields for describing filter parameters. Filtering can be performed on a number of fields such as the assay type, data type, probe array type, date; including month, day and year, sample project, experiment name, sample type, user name and others.
Figs. 8A'-8C illustrate representative Expression Data Mining Tool (EDMT) screens in a particular embodiment according to the present invention. These diagrams are merely illustrations and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 8 A' illustrates an EDMT screen 801 '. Screen 801 ' comprises a plurality of areas, such as an area 802' that provides information about filters. Filters can be applied to the experimental data to narrow down the field of data on which to mine. A results area 804' provides results of the filter data. A graphs area 806' provides a plurality of formats of graphs for viewing the data.
Fig. 8B' illustrates a filter area such as filter area 802' of Fig. 8 A' in a particular embodiment according to the present invention. Fig. 8B' illustrates filter area 802' having fields for a project filter 812', a probe array filter 814', a sample-type filter 816', an operator filter 818', a sample name filter 820', an experiment filter 822' and an analysis filter 824'. Fig. 8B' also illustrates a filter results field for illustrating the type of filters being applied to the data. Queries can be described using the filters of filter area 802'. In a presently preferable embodiment, a user can select the analyses to query and then select the ranges on the results.
Fig. 8C illustrates a results area such as results area 804' of Fig. 8A' in a particular embodiment according to the present invention. Fig. SC illustrates results area 804' having an experimental results table 830 and query results table 832' and a pivot results table 834'.
Figs. 8D'-8G' illustrate representative graphs such as can be displayed in graph section 806' of Fig. 8A' in a particular embodiment according to the present invention. These diagrams are merely illustrations and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 8D' illustrates a scatter-type graph of experimental results. The scatter graph can graph any numeric result on a logarithmic or linear scale. Further, a presently preferable embodiment can provide the capability to have multiple analyses per axes. A description of the probe set is included on the right side of the graph. A hotlink to external databases can also be provided at least in the preferred embodiment according to the present invention. Other options such as filters, point sizes, colors and the like can be specified by the user.
Fig. 8E' illustrates a fold change graph that can be displayed in graph area 806' of Fig. 8A' in a particular embodiment according to the present invention. Full change graph of Fig. 8E' can be provided using logarithmic or linear scales, the capability to provide a probe set description hotlinks to external data bases and recompute fold change can also be provided by particular embodiments according to the present invention. Further, users can specify options such as point sizes, colors and the like.
Fig. 8F' illustrates a representative bar graph such as can be displayed in graph area 806' of Fig. 8A' in a particular embodiment according to the present invention. The bar graph of Fig. 8F' can graph any numeric result and embodiments can provide the capability to users to change options such as bar size, colors and the like.
Fig. 8G' illustrates a representative histogram graph such as can be displayed in graph area 806' of Fig. 8A'. The histogram graph of Fig. 8G' provides the ability to histogram average differences to indicate various landmarks and can provide the user with the capability to specify options such as pin size, range, colors and the like. Fig. 9A' illustrates a queries display screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize __ other variations, modifications, and alternatives. Fig. 9A' illustrates name saved queries screen 901 ' having a display area for a plurality of filters. Users can define filters to the system and save them along with a reference name, that is displayed by screen 901 '. Filters can be saved to data mining information database 304' for later use. Fig. 9B' illustrates an annotation screen 903' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Annotation screen 903' provides a mechanism for displaying information about a probe set. Annotations can include an annotation text, a type of the annotation as well as other useful information. Annotation types can be user defined in a preferred embodiment. A user name can also be specified and a date can be specified. Other information can be specified in some embodiments and not all of this information will be specified in some embodiments.
Fig. 9C illustrates an example of displaying a probe annotation such as was configured in annotation screen 903' of Fig. 9B' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 9C illustrates a highlighted line of information 904' for which a corresponding probe annotation 906' is displayed. The probe annotation can provide the name of the probe, a description and other useful information.
Fig. 9D' illustrates a query annotation screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 9D' illustrates query annotation screen 910' having fields to specify probe sets types, annotations, a user identifier, a date, and a description. Query annotations can provide the ability to specify multiple filters and can also provide the ability to update annotations. Fig. 9E' illustrates a probe set description screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 9E' illustrates probe set description screen 912' having the name of a probe set and an associated description. __ These descriptions can also be displayed in the expression data mining tool screen 801 ' under the results section 804'.
Fig. 9F' illustrates a search screen for searching array descriptions in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 9F' illustrates search array descriptions screen 914' having an search field 916' for accepting input, and an output field 918' for displaying the probe sets which match the text entered in the input field for the description of the probe set. Search array descriptions screen 914' provides users with the capability to search descriptions in the database. The user can define the search criteria using the input field and can add the results to various filters.
Fig. 10A' illustrates screens for searching external databases in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 10A' illustrates a probe set description dialog screen 1002' having a probe set name, a description and various annotations. The user can search using the probe set description dialog screen 1002' for information corresponding to the description in external databases. By selecting the entrez database in dialog screen 1002', a browser window 1004' is displayed. Browser window 1004' provides for browsing information about gene genetic expression sequences and the like in external databases such as the entrez database. In a presently preferred embodiment, a URL can be associated with a particular probe set. Further, multiple URLs can be associated for a particular probe set and a browser window can be automatically activated by the system to display relevant information about a probe set from external databases.
Fig. 10B' illustrates a FAQ display selection screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 10B' illustrates a FAQ selection screen 1008' having a plurality of frequently used searches. A user can perform one of the searches by simply selecting the desired search. A dialog screen 1010' can be displayed to the user upon selection of a particular FAQ. Dialog screen 1010' provides a plurality of questions that the user can answer in order to define the selected search. In a presently preferable embodiment, FAQs can be stored in data mining information database 306'. Questions associated with a particular query, English translations and SQL statements can also be stored in the database with the FAQ. Fig. 10C illustrates a gene chip migration screen in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fig. 10C illustrates gene chip migration screen 1022' having a display area for local files in a plurality of formats 1024', a display area 1026' indicating data to migrate, a status area 1028' and a LIMS sample area 1030'. The migration screen can be used to add gene chip data to the LIMS. In a preferred embodiment, it can facilitate association of information about samples, experiments, scan data and results. Further, some embodiments can perform simulations of workflow. Fig. 10D' illustrates fluidics station control screens 1031 ' and 1032' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Fluidics control screens 1031' and 1032' can provide the user with the capability to control a fluidics station based upon selection of particular experiment names and protocols. The user can specify assay types, sample projects, reagents and protocols using the fluidics control screens.
Fig. 10E' illustrates a scanner control screens 1041' and 1042' for controlling the scanning to a local drive or to a network in particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Scan control screens 1041' and 1042' provide the capability to the user to specify experiment name, probe array types, number of scans to be performed, assay-types, sample projects, experiments and a display of the scanned experiments.
Fig. 10F' illustrates experiment information screens 1051 ' and 1052' in a particular embodiment according to the present invention. This diagram is merely an illustration and should not limit the scope of the claims herein. One of ordinary skill in the art would recognize other variations, modifications, and alternatives. Experiment information screens 1051' and 1052' provide the user with the capability to specify experiment names, probe array, probe array lots, operators, sample types, sample descriptions, projects, comments, reagents and reagent lots.
CONCLUSION
In conclusion the present invention provides a method for mining experiment information for a patterns selectable by a user. One advantage is that the method provides better access to genetic expression information than methods known in the prior art. Another advantage provided by this approach is that the results of numerous experiments can be mined effectively using visualization techniques and set theory queries, for example.
It is understood that the examples and embodiments described herein are for illustrative puφoses only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. For example, tables may be deleted, contents of multiple tables may be consolidated, or contents of one or more tables may be distributed among more tables than described herein to improve query speeds and/or to aid system maintenance. Also, the database architecture and data models described herein are not limited to biological applications but may be used in any application. All publications, patents, and patent applications cited herein are hereby incoφorated by reference.

Claims

WHAT IS CLAIMED IS:
1. A computer based method for mining a plurality of experiment information for a pattern, said method comprising: collecting information from experiments and chip designs; selecting from said experiments and said chip designs ones to be mined; defining at least one of a plurality of groupings for said experiments to be mined; selecting based upon said at least one of a plurality of groupings, information about said plurality of experiments to be mined, forming a plurality of resulting information, said plurality of resulting information including at least a resulting gene set; and formatting said plurality of resulting information for viewing by a user.
2. The method of claim 1 wherein experiments to be mined are selected based upon at least one of a plurality of experimental analyses.
3. The method of claim 1 wherein said at least one of a plurality of groupings is a sample type.
4. The method of claim 1 wherein said at least one of a plurality of groupings is a sample attribute.
5. The method of claim 1 wherein said plurality of groupings are sample attributes having a non-hierarchical arrangement.
6. The method of claim 1 further comprising adding experiments to said experiments to be mined.
7 The method of claim 1 further comprising deleting experiments to said experiments to be mined.
8 The method of claim 1 wherein said pattern is a gene pathway.
9 The method of claim 1 wherein said pattern is a drug toxicity.
10. The method of claim 1 further comprising enabling a user to apply set theory operations on said resulting gene sets.
11. A computer based method for working with expression information, said method comprising: collecting information about a plurality of results of a plurality of ΓÇö experiments; gathering information about samples and information about said plurality of experiments; adding at least one of a plurality of attributes to said information about said plurality of experiments; transforming said plurality of results of experiments, to form a plurality of transformed information; mining said plurality of transformed information; and visualizing said plurality of transformed information.
12. The method of claim 11 wherein said information about said plurality of experiments comprises at least one of a plurality of experimental analyses.
13. The method of claim 12 wherein said at least one of a plurality of experimental analyses comprises one or more experimental analyses.
14. The method of claim 11 wherein said transforming comprises normalizing and said transformed information comprises normalized information.
15. The method of claim 11 further comprising recording one or more results of said mining said plurality of transformed information.
16. The method of claim 11 further comprising citing theories about said transformed information.
17. A computer program product for mining a plurality of experiment information for a pattern, said computer program product comprising: code for collecting information from experiments and chip designs; code for selecting a subset of said experiments and said chip designs, said subset being a plurality of experiments to be mined; code for defining at least one of a plurality of groupings for said experiments to be mined; code for selecting based upon said at least one of a plurality of groupings, information about said plurality of experiments to be mined, to form a plurality of resulting information, said plurality of resulting information including at least a resulting gene set; __ code for formatting said plurality of resulting information for viewing by a user; and a computer readable storage medium for containing the codes.
18. The program product of claim 17 wherein said at least one of a plurality of groupings is a sample type.
19. The computer program product of claim 17 wherein said at least one of a plurality of groupings is a sample attribute.
20. The computer program product of claim 17 wherein said plurality of groupings are sample attributes having a non-hierarchical arrangement.
21. The computer program product of claim 17 further comprising code for adding experiments to said experiments to be mined.
22. The computer program product of claim 17 further comprising code for deleting experiments to said experiments to be mined.
23. The computer program product of claim 17 wherein said pattern is a gene pathway.
24. The computer program product of claim 17 wherein said pattern is a drug toxicity.
25. The computer program product of claim 17 further comprising code for enabling a user to apply set theory operations on said resulting gene sets.
26. A computer program product for working with expression information, said computer program product comprising: code for collecting information about a plurality of results of a plurality of experiments; code for gathering information about samples and information about said plurality of experiments; code for adding at least one of a plurality of attributes to said information about said plurality of experiments; code for transforming said plurality of results of experiments, to form a plurality of transformed information; code for mining said plurality of transformed information; code for visualizing said plurality of transformed information; and a computer readable storage medium for storing the codes.
27. The computer program product of claim 26 further comprising code for citing theories about said transformed information.
28. The computer program product of claim 26 wherein said code for transforming further comprises code for normalizing and said transformed information further comprises normalized information.
29. A system for managing expression information comprising: a database; a computer memory; and a processor, said processor operatively disposed to: collect information about a plurality of results of a plurality of experiments; gather information about samples and information about said plurality of experiments; add at least one of a plurality of attributes to said information about said plurality of experiments; transform said plurality of results of experiments, to form a plurality of transformed information; mine said plurality of transformed information; and visualize said plurality of transformed information.
30. A computer based method for managing information about a plurality of experiments conducted on a plurality of samples, wherein each experiment provides an indication of a degree of expression of particular genetic sequences in a sample, said method comprising: registering at least one of said plurality of samples with a centralized database; tracking a plurality of information about said plurality of samples; tracking a plurality of information about said plurality of experiments ;__ producing a sample history about said plurality of samples from said plurality of information; filtering said plurality of information about said plurality of experiments and said plurality of information about said plurality of samples according to filter input by a user to form a plurality of expression sequence information; publishing said plurality of expression sequence information; and providing a web based user interface to said user to enable the user to access said information.
31. The method of claim 30 wherein said information about said plurality of experiments includes a status of each of said plurality of experiments.
32. The method of claim 30 wherein said information about said plurality of experiments includes a result for each of said plurality of experiments.
33. The method of claim 30 wherein said information about said plurality of experiments includes a probe array type of each of said plurality of experiments.
34. The method of claim 30 wherein said information about said plurality of experiments includes a probe array lot number of each of said plurality of experiments.
35. The method of claim 30 wherein said information about said plurality of sample includes a sample type of each of said plurality of experiments.
36. The method of claim 30 wherein said information about said plurality of sample includes a sample project of each of said plurality of experiments.
37. The method of claim 30 wherein said plurality of experiments includes at least two experiments for each sample in said plurality of samples.
38. The method of claim 30 wherein said plurality of experiments includes one experiment for at least two samples in said plurality of samples.
39. A system for tracking information obtained from a plurality of gene expression sequence experiments, said system comprising: a server, having a data storage, said server operatively disposed to registering at least one of said plurality of samples with a centralized database; tracking a plurality of information about said plurality of samples; tracking a plurality of information about said plurality of experiments; producing a sample history about said plurality of samples from said plurality of information; filtering said plurality of information about said plurality of experiments and said plurality of information about said plurality of samples according to filter input by a user to form a plurality of expression sequence information; publishing said plurality of expression sequence information; and providing a web based user interface to said user to enable the user to access said information.
40. The system of claim 39 wherein said data storage is a GATC compliant database.
41. The system of claim 39 wherein said data storage is a plurality of relational databases.
42. The system of claim 39 further comprising a client connected to said server, said client operatively disposed to submit queries to said data storage of said server, said client further operatively disposed to receive responses from said server containing information contained in said data storage.
43. The system of claim 42 wherein said client and said server are interconnected by an internetwork.
44. A method for viewing a result of a plurality of experiments conducted on a plurality of samples, said results stored in at least one of a plurality of databases, said method comprising the steps: specifying which database to query; submitting at least one of a plurality of queries to form a result; viewing said result; filtering said result according to at least one of a plurality of user specified factors of interest to form a filtered result; and putting said filtered result into a graphical form.
45. A computer program product for managing information about a plurality of experiments conducted on a plurality of samples, wherein each experiment provides an indication of a degree of expression of particular genetic sequences in a sample, said product comprising: code for registering at least one of said plurality of samples with a centralized database; code for tracking a plurality of information about said plurality of samples; code for tracking a plurality of information about said plurality of experiments; code for producing a sample history about said plurality of samples from said plurality of information; code for filtering said plurality of information about said plurality of experiments and said plurality of information about said plurality of samples according to filter input by a user to form a plurality of expression sequence information; code for publishing said plurality of expression sequence information; code for providing a web based user interface to said user to enable the user to access said plurality of expression sequence information; and a computer readable storage medium for holding the codes.
46. The computer program product of claim 45 wherein said information about said plurality of experiments includes a status of each of said plurality of experiments.
47. The computer program product of claim 45 wherein said information about said plurality of experiments includes a result for each of said plurality of experiments.
48. The computer program product of claim 45 wherein said information about said plurality of experiments includes a probe array type of each of said plurality of experiments.
49. The computer program product of claim 45 wherein said information about said plurality of experiments includes a probe array lot number of each of said plurality of experiments.
50. The computer program product of claim 45 wherein said information about said plurality of sample includes a sample type of each of said plurality of experiments.
51. The computer program product of claim 45 wherein said information about said plurality of sample includes a sample project of each of said plurality of experiments.
52. The computer program product of claim 45 wherein said plurality of experiments includes at least two experiments for each sample in said plurality of samples.
53. The computer program product of claim 45 wherein said plurality of experiments includes one experiment for at least two samples in said plurality of samples.
54. A computer based method for managing information about a plurality of experiments conducted on a plurality of samples, wherein each experiment provides an indication of a degree of expression of particular genetic sequences in a sample, said method comprising: tracking information about said plurality of experiments conducted on said plurality of samples to form a database of information; analyzing the results of the tracking step; querying the database.
EP99954613A 1998-09-17 1999-09-15 Method and apparatus for providing an expression data mining database and laboratory information management Withdrawn EP1038245A4 (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
1997-01-14
US10074098P 1998-09-17 1998-09-17
US10072498P 1998-09-17 1998-09-17
US100724P 1998-09-17
US100740P 1998-09-17
US09/354,935 US6185561B1 (en) 1998-09-17 1999-07-15 Method and apparatus for providing and expression data mining database
US354935 1999-07-15
US09/397,494 US20030028501A1 (en) 1998-09-17 1999-09-15 Computer based method for providing a laboratory information management system
PCT/US1999/021305 WO2000016220A1 (en) 1998-09-17 1999-09-15 Method and apparatus for providing an expression data mining database and laboratory information management

Publications (2)

Publication Number Publication Date
EP1038245A1 true EP1038245A1 (en) 2000-09-27
EP1038245A4 EP1038245A4 (en) 2002-11-13

Family

ID=27493152

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99954613A Withdrawn EP1038245A4 (en) 1998-09-17 1999-09-15 Method and apparatus for providing an expression data mining database and laboratory information management

Country Status (1)

Country Link
EP (1) EP1038245A4 (en)

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALLEE C: "DATA MANAGEMENT FOR AUTOMATED DRUG DISCOVERY LABORATORIES" LABORATORY ROBOTICS AND AUTOMATION, VCH PUBLISHERS, NEW YORK, US, vol. 8, no. 5, 1996, pages 307-310, XP000901022 ISSN: 0895-7533 *
ERMOLAEVA O ET AL: "DATA MANAGEMENT AND ANALYSIS FOR GENE EXPRESSION ARRAYS" NATURE GENETICS, NEW YORK, NY, US, no. 20, September 1998 (1998-09), pages 19-23, XP002950500 ISSN: 1061-4036 *
KERLAVAGE A R ET AL: "DATA MANAGEMENT AND ANALYSIS FOR HIGH-THROUGHPUT DNA SEQUENCING PROJECTS" IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, IEEE INC. NEW YORK, US, vol. 14, no. 6, 1 November 1995 (1995-11-01), pages 710-717, XP000598295 ISSN: 0739-5175 *
See also references of WO0016220A1 *

Also Published As

Publication number Publication date
EP1038245A4 (en) 2002-11-13

Similar Documents

Publication Publication Date Title
US6185561B1 (en) Method and apparatus for providing and expression data mining database
US6229911B1 (en) Method and apparatus for providing a bioinformatics database
US6826296B2 (en) Method and system for providing a probe array chip design database
US20060020398A1 (en) Integration of gene expression data and non-gene data
US20030171876A1 (en) System and method for managing gene expression data
US20030100999A1 (en) System and method for managing gene expression data
US20050044110A1 (en) System and method for internet-accessible tools and knowledge base for protocol design, metadata capture and laboratory experiment management
US20030028501A1 (en) Computer based method for providing a laboratory information management system
CA2418475A1 (en) Integrated multidimensional database
WO2002073504A1 (en) A system and method for retrieving and using gene expression data from multiple sources
US20060047697A1 (en) Microarray database system
US20020052882A1 (en) Method and apparatus for visualizing complex data sets
US20060271513A1 (en) Method and apparatus for providing an expression data mining database
AU781841B2 (en) Graphical user interface for display and analysis of biological sequence data
WO2000016220A1 (en) Method and apparatus for providing an expression data mining database and laboratory information management
EP1366359A1 (en) A system and method for managing gene expression data
EP1038245A1 (en) Method and apparatus for providing an expression data mining database and laboratory information management
JP2003526133A6 (en) Method and apparatus for providing expression data mining database and laboratory information management
JP2003526133A (en) Method and apparatus for providing expression data mining database and laboratory information management
US20060212229A1 (en) Method and system for providing a probe array chip design database
EP1396800A2 (en) Method and apparatus for providing a bioinformatics database

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000613

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

A4 Supplementary search report drawn up and despatched

Effective date: 20021002

AK Designated contracting states

Kind code of ref document: A4

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

17Q First examination report despatched

Effective date: 20060331

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070331