WO2001090951A2 - Systeme a acces internet permettant le stockage, l'extraction et l'analyse de donnees fondes sur un protocole de repertoire - Google Patents

Systeme a acces internet permettant le stockage, l'extraction et l'analyse de donnees fondes sur un protocole de repertoire Download PDF

Info

Publication number
WO2001090951A2
WO2001090951A2 PCT/US2001/016375 US0116375W WO0190951A2 WO 2001090951 A2 WO2001090951 A2 WO 2001090951A2 US 0116375 W US0116375 W US 0116375W WO 0190951 A2 WO0190951 A2 WO 0190951A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
protocol
database
information
directory
Prior art date
Application number
PCT/US2001/016375
Other languages
English (en)
Other versions
WO2001090951A3 (fr
WO2001090951A9 (fr
Inventor
Leonard A. Herzenberg
Wayne Moore
David Parks
Leonore Herzenberg
Vernon Oi
Original Assignee
The Board Of Trustee Of The Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustee Of The Leland Stanford Junior University filed Critical The Board Of Trustee Of The Leland Stanford Junior University
Priority to AU2001263335A priority Critical patent/AU2001263335A1/en
Publication of WO2001090951A2 publication Critical patent/WO2001090951A2/fr
Publication of WO2001090951A9 publication Critical patent/WO2001090951A9/fr
Publication of WO2001090951A3 publication Critical patent/WO2001090951A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Definitions

  • the present invention is related to databases and the exchange of scientific information. Specifically the invention disclosed a unified scientific database (IBRSS) that allows researchers to easily share their data with other researches.
  • IBRSS unified scientific database
  • the present invention also allows for the ease of data collection, annotation, storage, management, retrieval and analysis of scientific data through and into the database. In addition, it allows for archival storage and retrieval of data collected directly from laboratory instruments to ensure data consistency for patent and other purposes. It also allow for ease of sharing data between laboratories in remote locations.
  • the present invention also supports the automated creation of experimental protocols.
  • FACS Fluorescent Activated Cell Sorting
  • Flow cytometry is a technique for obtaining information about cells and cellular processes by allowing a thin stream of a single cell suspension to "flow" through one or more laser beams and measuring the resulting light scatter and emitted fluorescence. Since there are many useful ways of rendering cells fluorescent, it is a widely applicable technique and is very important in basic and clinical science, especially immunology. Its importance is increased by the fact that it is also possible to sort fluorescent labeled live cells for functional studies with an instrument called the Fluorescence Activated Cell Sorter (FACS).
  • FACS Fluorescence Activated Cell Sorter
  • Gel electrophoresis is a standard technique used in biology. It is designed to allow sample to be pulled through a semisolid medium such as agar by an electro-magnetic force. This technique allows for separation of small and macromolecules by either their size or charge.
  • FIG. 1 is a diagram of the flow of information in a biological experiment
  • Fig. 2 is a diagram of a directory archival system
  • Fig. 3 is a diagram of information flow from instruments to and from the database (IBRSS) in one embodiment of the present invention
  • Fig. 4 is a diagram of information flow from instruments, analysis programs, remote databases, and other software and to the central database in one embodiment of the present invention.
  • Fig. 5 is a the hierarchical structure of a single study
  • the present invention will be best understood from the point of view of a laboratory worker using the invention.
  • the invention may allow the user to simplify laboratory work by allowing interactive automation of much of the work with the use of a computer.
  • the work that may be performed by the present invention may be able to make the researcher more efficient.
  • the steps of the laboratory process the invention may address is collecting, sharing, retrieving, analyzing, and annotating data.
  • the present invention has equal application to the storage of any data type, one embodiment relates to the storage of data associated with a biological sample data.
  • the first step the researcher may perform is to define a study 501.
  • a study may be defined as the overall goal of the research the researcher may wish to attain.
  • the study may contain protocols that capture the hypothesis to be tested and the factors that go into them, including subjects, treatments, experiments, samples and the study timeline.
  • the study may contain data and information collected in experiments that are part of the study. This may create a parent study node under which information and data pertaining to the study may be kept in child nodes.
  • the present invention may allow a researcher to create experiments and experimental protocols 502 and 503 that may become part of the overall study.
  • the experiment may contain protocols that acquire information to define the subset of subjects for which the data may be collected, the set of samples to be obtained from the subjects, and the analytic procedures and data collection instruments used to analyze the samples.
  • the experiment protocol may become a child node of its parent study.
  • the researcher using the present invention also may obtain data 504 and 505 for each study and experiment he performs.
  • the data may be collected each time the researcher performs the same experiment protocol.
  • the data may also contain protocols designed to acquire annotation information to define the subdivision (aliquotting) and the treatment (reagents and conditions) for a set of samples for which data may be collected by a single analytical method (usually a single instrument).
  • researchers then analyze data they obtain, and the researcher using the present invention may analyze the collected data. This analysis may stored as a child- node of the data or the annotation of the data 506 and 507.
  • the present invention may create Internet addresses for all of the results of the individual analyses and for the data sets created. These may be child nodes 508 and 509 of the data or experiment information. Thus, the present invention allows the user to possess unique web addresses for any of the data or analysis results that he may wish to include in a publication.
  • the study, experimental protocol, data collection, and analysis results, may be stored as described in FIG. 5.
  • the study and the experiment are still the touchstone of research science.
  • the present invention may allow the researcher to interactively create protocols for studies and experiments.
  • the protocol creators may use wizards to ease the researcher's creation of the protocols.
  • the researcher may invoke a protocol creator/editor on a computer.
  • the computer may provide the researcher with a list of possible studies or experiments the researcher may wish to perform.
  • the computer may also provide the ability for the researcher to create an entirely new type of study or experiment. After the type of study or experiment is chosen, the researcher may then be a given the option of how to set up the experiment.
  • the types of experiments that will be described in this application specifically are clinical and basic studies and FACS and electrophoresis gel experiments.
  • Other types of data that can be similarly stored and used within the database include DNA microarray data and clinical data.
  • the clinical data may include red blood cell counts and RBC, MCV, MHC, MCHC, and potassium levels or may include observational data such as blood pressure, temperature, types of drugs taken, race, age, etc.
  • An example of a study may be a clinical study.
  • the study may be designed to test one or more hypotheses.
  • An example of a hypothesis may be testing whether the number of CD8 T cells is correlated with the erythrocyte volume.
  • HIN-infected patients may be recruited on the basis of meeting a series of entry criteria. Examples of such criteria are:
  • Experiments in the study may be conducted on samples from patients to determine whether the patient meets the entry criteria for the study.
  • information and experiment results for each potential study entrant may be stored in the study.
  • the study may contain experiments such as staining cells from the patients with antibodies that reveal cells that express surface CD4 and analyses such as those that enumerate the number of cells expressing CD4.
  • Relevant information about the subjects (patients) in the study may be passed from the study to protocol wizards that may help the user define the contents of experiments such as which samples from which subjects may be examined.
  • the study may also allow the user to select from model protocols for the experiment to define types and the amounts of the FACS reagents that may be used.
  • the study subject may appear on a list from which the user chooses the samples to be examined in an experiment.
  • the study may also specify that the protocol automatically send data that is collected to analysis programs and provide necessary information to enable the automated analysis and to return specified results of the analysis to the study.
  • the study may be triggered to specify automated analyses that return further digested results to the study.
  • One result of this process may be the automatic identification of subjects that qualify for further study by determining that the study criteria are met, such as the subjects' erythrocyte counts and CD4 counts are within the specified ranges.
  • the automated analysis may include the returning of FACS plots comparing CD4 and CD8 levels, the returning of charts with each subject's mean levels of CD4, CD8, erythrocyte counts, or other specified variables.
  • the automated analyses may also specify the performance of statistical procedures and the return of results of these analyses.
  • the study may have methods for summarizing and displaying results of analyses.
  • the study may track samples to determine whether required experiments were performed and specified data returned and may contain information about the physical location of stored samples, the amount of the sample that has been used, the treatment of the sample.
  • a basic research study may contain samples from mice, information about the genetic makeup of the mice and references to genome other databases relevant to the mice. It may also contain information about the treatments that individual or groups of mice were given or may be given during the experiment and about the drugs or other materials with which the mice were or may be treated. The study may also contain the timeline for treatment and, as above, define protocols and automated analyses for collected data.
  • a FACS experiment in a study comprises staining cells with various fluorescent antibodies and running and possibly collecting cells through a cell sorter.
  • the wizard may help the experimenter create his experiment by creating a suggested protocol for him to follow.
  • the wizard or other interactive device may ask the researcher how many different stains he wishes to use to mark various structures. These stains may, but do not necessarily need to be stains for different structures. Typically the stains may be fluorescent conjugated antibodies.
  • the user may then inform the protocol creator which structures he wishes the stains to mark and the wizard may respond with an offer of a series of "option" lists from which the user may select the type of cells and the specific reagents to be used in the experiment.
  • Option lists may be generic types of cells or cells and samples specified in the parent study to which the experiment belongs.
  • the wizard then may ask the researcher which FACS machine he plans to use.
  • Each FACS machine may be equipped with different lasers or light filters enabling different FACS machines to collect data for antibodies labeled with different fluorescence "colors".
  • the wizard may then determine whether the FACS machine specified by the user is able to take data for the fluorescent reagents selected in the protocol. Alternatively, the wizard my suggest which of the FACS machines available to the user can be used. In either case, the wizard may then assist the user in scheduling an appropriate analysis time period on an appropriate FACS machine.
  • the protocol creator may use combinatorics or other procedures to define the reagent and cell sample combinations that the user may have to pipet (add to tubes) to complete the experiment and create a protocol for the researcher to follow.
  • This protocol may specify the control tubes that are required and provide the concentrations and amounts of antibodies to use, the dilutions of the antibodies, the various steps to perform, the various centrifugations to perform, and the FACS to operate.
  • a control tube may be suggested for each antibody employed in the study. Further a blank control tube for each separate organism may be suggested to determine autofluorescence.
  • the reagents used by the protocol may have attributes associated with them. These attributes may include the reagent's distinguished name, Clone ID, Common name, Specificity, Titre, Fluorochrome Name, Fluorochrome Lot number, and concentration.
  • the user may be prompted to select the reagents used through a "Reagent Palette".
  • Such a palette may contain a catalog of reagents in stock, pre-dete ⁇ nined sets of reagents typically used in similar protocols, and an ability for the user to enter a new choice of reagents for the experiment.
  • the protocol creator may also perform various tasks behind the scenes to create a valid protocol for the researcher, to call for pre-packaged analyses, to check data quality during data collection, and to display the information about the reagents and cells in a sample at the time of data collected or any other time.
  • the protocol editor may be tied to a database to enhance its, as well as the researcher's efficiency. In the previous example, several items may be used from the database to create the FACS protocol. For example,
  • the database may hold data for the fluorescent recognition abilities of all of the FACS machines available to the user. This may allow the protocol editor to select only those reagents that are available to the user and can be viewed by the FACS chosen by the user. There are a wide variety of possible combinations of possible reagent choices that can be selected. Specifically, there may be n!/(n- k)!k! possible reagent choices where n is the total number of fluorescent "colors" that for which the FACS can collect data and k is the number of stains used in the
  • the present invention may provide a novel way to enhance the effectiveness and speed of the selection of the reagent combination by applying well know combinatorial techniques and depth-first search in a new way to this biological problem. This may be performed by selecting one reagent at a time recursively. If the most recently added reagent cannot be used with the current set, then that reagent may be removed from the list of suggested reagents. The algorithm may run until a set of usable reagents is determined.
  • the protocol creator may also consult laboratory databases to determine how much of each reagent may be available to the user. If the protocol creator finds that the amount of reagent available is below a pre-set threshold, it may automatically indicate the reagent shortage and suggest another combination to be used. The protocol creator may also consult the database as to the effectiveness of each stain to bind to the type of cell being used. It may then use a greedy or any other algorithm (such a s the ones suggested to select reagents combinations) to select an optimal set of stains to be used in the experiment. Other factors may also be taken into this optimization including the price of the reagents, the temperature compatibility of the reagents in a given combination, and the resolution possible for target cell surface or internal markers when stained with the selected reagent combination. This may be performed using a scoring function that provides a score for each of the factors in selecting the reagents.
  • the protocol creator may suggest the layout of the wells, tubes, or containers used to perform the experimental protocol.
  • the layout may depend on the proximity of like samples, like reagents, and controls.
  • the layout may also be created to minimize the movement of the person undertaking the protocol. Such an instance would be when several tubes require the same regent cocktail. In this case, it would be of benefit to have those wells, tubes, or containers located near one another.
  • the protocol editor may also suggest the creation of reagent cocktails when several reagents with the same proportions are needed in various wells, tubes, and containers.
  • the reagent cocktails may be designed by determination of like reagents used in multiple wells. This determination may be through linear programming or another optimization routine designed to minimize the number of pipeting steps or any other experimental concern such as time, cost, or ease.
  • the constraints for such a linear programming model may include any of the aforementioned factors contributing to experimental time, ease, or cost.
  • the protocol creator may also suggest the use of different FACS machines that are capable of performing the experiment because either the FACS machine may be cheaper to operate or the cost of the reagents for that FACS machine may be cheaper.
  • the protocol creator may also anticipate what type of data may be collected and may prepare table and charts to be filled in after the experimental data is collected. One method of creating charts may be to create 2-axes graphs for all the pairs of data that the protocol is expected to collect.
  • the protocol creator may then allow the user to store and re-use the protocol in the database under the current study or any other study the scientist wishes to use the protocol for.
  • the protocol creator may cooperate with the data collector to couple the collected data with the annotation information (reagents, cells, treatments) known to the creator and may send the coupled data and annotations to the database for permanent storage and archiving.
  • experiment-related information may be sent to the database to be coupled with the sample data and annotation. These couplings may be accomplished by storing the data separately from the annotation data and associating these items permanently by use of non- volatile pointers or some other means.
  • the parent study may also be informed of the completion of the experiment and the location of the output from the experiment (protocol and data collection).
  • This experiment may create data that may automatically be captured by the database, coupled with the annotation information in the protocol, transferred from the machine used to collect the data (FACS, in the example above) directly to the proper location for the particular experimental data.
  • This can be performed in several ways, including the use of LDAP, XML and XSL style sheets.
  • Analysis programs may automatically perform preliminary analysis specified by the protocol or elsewhere.
  • the protocol editor may determine the nature of data and may inform the analysis program the type of data that is represented.
  • the data types may include nominal, ordinal, or continuous that are either dependant or independent variables.
  • the variables may also be crossed or nested.
  • analyses may be informed by the annotation and possibly other information associated with the data (such as data type) collected for each sample.
  • Results from these preliminary analysis may be stored and associated with the collected data and be locatable via an experiment data tree that may be available for the experimenter to view.
  • FACS analysis the collected and annotated data may automatically be sent to a FACS data analysis program such as FloJo or CellQuest.
  • FACS analysis software may suggest possible gating strategies with the use of clustering algorithms or other artificial intelligence techniques. Further gating data may be displayed using the annotations from the protocol editor to determine the labeling of the axes of the displayed data.
  • the data also may be sent for analysis to a statistics analysis package such as IMP (from the SAS Institute).
  • the data may be automatically processed to determine such statistics as median attribute values and standard deviations of attribute values.
  • Gel electrophoresis may also be incorporated into the current system of protocol development.
  • the protocol creation wizard may prompt the user to select/input the type of gel that is to be run. These gels may include a Northern or Southern blot. Further, the wizard may prompt the user to input the number of lanes in the gel and select the sample is to be placed in each lane.
  • the sample may be defined at the protocol level or may be selected from in a list generated from information already entered into the study to which the experiment protocol belongs.
  • the protocol creation wizard may prompt the user to determine which type or types of standard controls, such as ladders, are going to be used in the experiment.
  • the protocol wizard may suggest the lanes that each specimen should be placed in according to rules pre-defined for the type of gel and sample in the experiment.
  • the user may bring the gel to an instrument for automated or manual data collection. For instance, the user may bring the gel to an ultraviolet gel reader connected to a computer. The reader may take a picture of the gel and send a digitized version, coupled with the protocol information that describes the sample and the experiment, to a central data store for archiving. The gel reader may then send the digitized picture to an analysis program.
  • the data in the data store may be sent at the user's request, to the analysis program.
  • This analysis program may determine the size of each fragment found in the gel by comparing their positions to the positions of the ladder.
  • the results of the analysis may then archived in the database for later retrieval, further analysis or abstraction into summaries in the parent study.
  • the parent study may also be informed of the completion of the experiment and the location of the output from the experiment (protocol and data collection).
  • the experimental models may be selected by the user to provide the protocol creator what type of experiment to create.
  • the experimental models may include: 1) Crossing Model: Many experiments are essentially combinatorial, i.e., this set of reagents or reagent cocktails is applied to each sample in a group of samples. Typically it may correspond to some N X M grid of wells in the staining plate. An experiment might have 1 or more of these repeated sets of reagents. 2) Titration Model: The user may specify a target sample and a reagent and then a range of dilutions 2, 4, 8... or 10, 20, 50, 100 being typical. The layout of the dilution may be as a single column, a single row, or otherwise on the plate or other type of container.
  • 3) Screening Model The user may specify a reagent cocktail and a large number of samples which are quasi-automatically named. 4) Fluorescence Compensation Controls Model: For each dye (or dye lot) which occurs in an experiment model, the user or protocol editor may specify a sample to be used as a control. Usually the control will be one of the samples which is stained with the reagent.
  • Unstained Controls Model The user or protocol editor may define an unstained or negative control for a protocol involving staining. Unstained controls and fluorescence compensation controls may be coupled in a together in a single experimental protocol to create a population of suitable controls.
  • the protocol editor may create a GUI representing the wells, tubes, or other containers holding the reagents and samples.
  • the user may be able to "drag and drop" the sample or reagent to another well, tube, or container to alter the experimental protocol the user created or the protocol creator suggested.
  • the software may test the hypothesis stated in the study protocols.
  • the hypothesis may be test by combining the statistical information gathered during the experimental protocols and determining if they fit the hypothesis. This determination may be done manually by viewing the data or automatically by allowing the data to be analyzed by a data analysis package such as JMP.
  • JMP may automatically analyze the data that may be specified by the user when the user creates an experimental protocol with the appropriate wizard. The wizard may then associate the expected data with the study node with so that the hypothesis may automatically be tested.
  • the database may allow access to the data for several purposes.
  • the user may be able to provide hyperlinks to collected data and experimental protocols so that others may access the data and protocols. Others that would access the data may include collaborators, reviewers, and others reading published articles containing hyperlinks to the data.
  • the database may act as a cell surface expression library enabling people such as researchers and clinicians to facilitate diagnosis and definitions of new conditions by comparing the data from the database with locally collected data. Other uses of this database would be obvious to those skilled in the art.
  • the database may be constructed using any known database technique including the use of LDAP directories and protocols, XSLT style sheets, and XML documents.
  • the database may be at a centralized site remote to the experimenter. The experimenter may send or receive information between his computer and the database via the Internet or any other communication means.
  • LDAP is a "lightweight" (smaller amount of code) version of DAP (Directory Access Protocol), which is part of X.500, a standard for directory services in a network.
  • DAP Directory Access Protocol
  • the present invention may put these to unique uses in the scientific arena.
  • the style-sheet transformation language (XSLT) defines the transformation of the original input (XML) document to "formatting objects" such as those included in HTML documents. In a traditional style sheet, these are then rendered for viewing.
  • XSLT transformation grammar can also be used to transform XML documents from one form to another, as in the following examples: a) Loading directories.
  • XSLT may be used to transform an XML file generated by any data processing application to an XML representation of a directory (sub)tree, i.e., to extracting directories entries from the XML document.
  • the ability to use XSLT for this transformation greatly simplifies the creation and maintenance of LDAP or other directories that serve diverse information derived from distinct sources (e.g, FACS instruments and genome data banks) that generate different types of XML documents. In essence, using XSLT removes the necessity for writing distinct Java code to construct the directory entries for each type of document.
  • a default XSLT directory style sheet can be created to extract a pre-defined set of indexing elements included in arbitrary XML documents. This would enable creation of the corresponding directory entries for these indexing elements.
  • XSLT can be used to transform a subset of the information in an XML file so that it can be read by a program that takes XML input in a particular format.
  • XSLT can launch the program and pass the result of the transformation during the launch.
  • XSLT stylesheets we can launch an analysis application by transforming an XML file containing the results of a directory search to an application-readable file containing URLs for the data and appropriate annotation information for the analysis. This option can be made available for all co-operating applications and need not be restricted to FACS data.
  • XSLT style sheets can be used to change the form of a document.
  • XSLT and other capabilities may be used to store analysis output along with the primary data and annotation infomiation.
  • other developed fully cooperating applications may be used to analyze of FACS and other data.
  • LDAP LDAP server and client toolkits. Standalone servers and LDAP to X.500 gateways are available from several sources. LDAP client libraries are available for the C language from Univ. Michigan and Netscape and for the Java language from Sun and Netscape.
  • LDAP is a standard that is directly utilized by the clients and makes it possible for all clients to talk to all servers.
  • SQL standardization may be more apt with transportability of programmers and database schema than interoperability of databases.
  • the X.500 information model is extremely flexible and its search filters provide a powerful mechanism for selecting entries, at least as powerful as SQL and probably more powerful than typical OODB.
  • the standard defines an extensibleObject that can have any attribute.
  • some stand-alone LDAP implementations permit relaxed schema checking, which in effect makes any object extensible. Since an attribute value may be a distinguished name, directory entries can make arbitrary references to one another, i.e., across branches of the directory hierarchy or between directories.
  • LDAP and X.500 servers permit fine grained access control. That is to say, access controls can be placed on individual entries, whole sub trees (including the directory itself) and even individual attributes if necessary. This level of control is not available in most existing databases.
  • LDAP directory is organized in a simple "tree" hierarchy consisting of the following levels:
  • This example tree structure of an LDAP directory is illustrated in Figure 2.
  • the parent node of the tree is the root node 201.
  • the children of the root directory are country nodes 202.1 and 202.2.
  • Each country node can have child organization nodes such as organization nodes 203.1 and 203.2 (children of country node 202.2).
  • organization group nodes such as nodes and 204.3 which are children of organization node 203.2
  • Each group can have children nodes representing individuals such as group node 204.3 having children nodes 205.1,205.2, and 205.3.
  • DNS Domain Name System
  • LDAP makes it possible to search for an individual without knowing the domain.
  • An LDAP directory can be distributed among many servers. Each server can have a replicated version of the total directory that is synchronized periodically.
  • An LDAP server is called a Directory System Agent (DSA).
  • DSA Directory System Agent
  • An LDAP server that receives a request from a user takes responsibility for the request, passing it to other DSAs as necessary, but ensuring a single coordinated response for the user.
  • the present invention contemplates extensions and modifications to LDAP protocols to make them usable not just as directories, but to also provide data itself.
  • the present invention takes advantage of hierarchical levels of LDAP already established by the International Standards Organization (ISO) and uses those organizations to provide a first level of uniqueness to the biological sample to be named.
  • ISO International Standards Organization
  • Referrals mean that one server which cannot resolve a request may refer the user to another server or servers which may be able to do so. During a search operation any referrals encountered are returned with the entries located and the user (or client) has the option of continuing the search on the servers indicated. This allows federation of directories which means that multiple LDAP/X.500 servers can present to the user a unified namespace and search results even though they are at widely separated locations and the implementations may actually be very different.
  • JNDI Java Naming and Directory Interface
  • Sun Java Naming and Directory Interface
  • JNDI may remove many of the limitations of LDAP as an OODB by providing a standard way to identify the Java class corresponding to a directory entity and instantiate it at runtime. It also allows storage of serialized Java objects as attribute values. Sun has proposed a set of standard attributes and objectClasses to do this.
  • Monoclonal antibodies are distinguished by cloneName or clone which is unique within the parent entity which must be an investigator or organization.
  • Lymphocyte differentiation antigens a thesaurus of the target specificities of monoclonal antibodies. would include but not be limited to the official CD names.
  • Directory searches may also be a tool available in the database. Information may be promoted upward from the documents into the directory for searching and no searching is done within the documents.
  • a search application may use the LDAP search functions to retrieve a set of candidate XML documents (based on their directory attributes) and then may use XQL or Xpath to further refine this set.
  • a unified interface may be provided that would largely make the differences in search strategies transparent to the user. The user then may be able to select (search and retrieve) for items within the document that are not reflected in the directory or may extract elements from these documents, e.g., samples from a set of experiments.
  • the instruments may be responsible to collect, annotate and export the collected experimental data.
  • the instruments may annotating it with information generated during the data collection, and for transmit the annotated primary data to the LDAP server for storage in the database in association with the appropriate XML-encoded experiment and study descriptions.
  • the following modules may be used to perform these functions: a) Set-up module(s) - automate aspects of instrument set-up and standardization; record and visualize relevant instrument information; acquire and respond to user input b)
  • Data collection module(s) - collect primary (instrument-generated) data for the aliquots of each sample; visualize protocol information to facilitate data collection; acquire and respond to user input; record machine condition and user comments specific to each data collection.
  • the central database may be a large scale (terabyte level), web accessible, central storage system coupled with small-scale volatile storage deployed locally in a manner transparent to the user.
  • This system may store data and annotation information transmitted from the data collection system.
  • it may catalog the stored data according to selected elements of the structured annotation information and may retain all catalog and annotation information in a searchable format.
  • industry standard formats for storing data and annotation information will be implemented. If no standard is available, interim formats may be used and may allow for translators to industry standards once the industry standards become available.
  • the database may capitalize on the built-in replication and referral mechanisms that allow search and retrieval from federated LDAP networks in which information can be automatically replicated, distributed, updated and maintained at strategic locations throughout the Internet. Similarly, because pointers to raw data in LDAP are URLs to data store(s), the database may capitalize on the flexibility of this pointer system to enable both local and central data storage.
  • the database may enable highly flexible, owner-specified "fine-grained" access controls that prevent unauthorized access to sensitive information, facilitate sharing of data among research groups without permitting access to sensitive information, and permit easy global access to non-sensitive data and analysis results.
  • the central database may also allow for the retrieval of annotated data sets (subject to owner-defined accessibility) via catalog browsing and/or structured searches of the catalog;
  • the central database may also automatically verify authenticity of the data based on the data's digital signature. This function may be accomplished by launching internal and co-operating data analysis and visualization programs and transfer the data and annotation information to the program. Further the database may put the data and annotation information into published-format files that can be imported into data analysis and visualization programs that do not provide launchable interfaces.
  • the central database may also allow for retrieval of analysis output. This function may be accomplished by recovering/importing the link analysis output with primary and annotation data to provide access to findings via subject and treatment information that was entered at the study and experiment levels.
  • Flow cytometry 1 is a technique for obtaining rnformation about cells and cellular processes by allowing a thin stream of a single cell suspension to "flow" through one or more laser beams and measuring the resulting light scatter and emitted fluorescence Since there are many useful ways of rendering cells fluorescent, it is a widely applicable technique and is very important in basic and clinical science, especially immunology Its importance is increased by the fact that it is also possible to sort fluorescent labeled live cells for functional studies with an instrument called the Fluorescence Activated Cell Sorter (FACS) At our FACS facility alone, we have processed millions of samples in the last 15 years
  • ISO International Standards Organization
  • X.500 directory servers 3 achieve uniqueness with distinguished names (dn) that are assigned hierarchically.
  • X.500 3 is the core of a set of standards adopted by the International Standards Organization (ISO) beginning in 1988, which defines what may be simply called directory service.
  • a directory is fundamentally a database. Directories were originally defined in order to allow users and their agents to find information about people, typically their telephone number but possibly including postal address, e- mail address and other information. This was extended to include documents, groups of users and network accessible resources such as printers and more recently databases. Three parts of the standard are of particular interest, the information model, the functional model and the namespace.
  • the X.500 information model is very powerful and flexible.
  • the standard defines entries which have a set of named attributes that can have one or more values and may be absent.
  • Each attribute has a name and a type and each type has a name and a syntax which is expressed in Abstract Syntax Notation One (ASN.l).
  • ASN.l Abstract Syntax Notation One
  • Every entry must have an attribute objectClass which defines what attributes are possible and which are required and may have an attribute aci (for access control information) which the server uses to control access to the entry.
  • Object classes are hierarchical, i.e., a class can inherit attributes from a parent class and by defining new attributes extend it's scope
  • the entries in a directory are organized hierarchically. That is to say that any entry may have one or more subentries so that the whole structure may be visualized as a tree.
  • rdn relative distinguished name
  • the functional model defines a set of operations which may be applied to a directory: read, list, search, add, modify, delete (which are pretty much self explanatory) and bind, unbind and abandon which are used to establish the users credentials, end a connection to the server and cancel a running query respectively.
  • the search function starts from a root dn and finds all entities further down in the hierarchy which pass a search filter constructed from the "usual suspects", i.e., equal, less than, contains, sounds like etc. applied to the attributes of the entity.
  • a search filter may of course test the objectClass attribute and return only entries of a particular type. Clients can specify searches which return all the attributes of each entry or only a selected set of attributes.
  • DAP Directory Access Protocol
  • OSI Open System Interconnect
  • LDAP Lightweight Directory Access Protocol 4 ' 5
  • LDAP adopts the X.500 data model essentially intact. It simplifies the functional model by collapsing the read, list and search functions into a single search function with object, one level or sub tree scope respectively. It handles distinguished names as strings rather than the structured objects that DAP uses which transfers the responsibility for parsing them to the server. Conversely most of the responsibility for interpreting the attribute values reverts to the client.
  • LDAP returns the results as individual packets which allows lightweight clients to process result sets which they cannot store in memory. LDAP does not include much of the elaborate security and authentication mechanisms used by DAP and also simplifies the search constraints to the maximum number of entries to return and maximum time to spend searching.
  • LDAP v2 Unfortunately one X.500 function known as referral was not included in LDAP v2. This allows one DSA to return to the client a referral which directs the client to try again on a different DSA. An LDAP v2 server is supposed to follow all referrals on behalf of the client and not return them to the client at all.
  • LDAP v2 5 was proposed to the Internet Engineering Task Force (IETF) as a draft standard but was not adopted due to its technical limitations. This lead to the effort to define a more acceptable version. Also in this period the utility of stand alone LDAP servers, i.e., servers which implemented the info ⁇ nation and functional models directly rather than relying on a higher tier of X,500 servers became clear.
  • IETF Internet Engineering Task Force
  • LDAP v3 6 addresses the problems discussed above and was adopted by IETF in 1998 as a proposed standard for read access only. The IETF feels that the authentication mechanisms are inadequate for update access but has allowed the standard to proceed for read access when some other means of updating is used. (See also, Hodges 7 ).
  • directory service is the rolodex or a box of 3X5 cards.
  • directory servers manage smallish packets of information (a directory entry or card) associated with a named persons or organizations that can record a diverse set of attributes.
  • Directory service is not simply a billion card rolodex however because the servers don't just maintain the information, they will search through it for you and return only selected information. Servers can also suggest other servers (referrals) to enlist in the effort, i.e., you may end up searching several directories to get a result but not need to be aware of this.
  • Directory servers do not perform the join operation that relational databases use to combine information from different tables. Instead they offer increasing flexibility in representing and searching for information.
  • An attribute of an entry in a directory may be missing or have multiple values. While it is possible to represent multiple values in relational form it requires introducing new tables and joins, i.e., substantial overhead and complexity so it is generally not done unless it is necessary. Missing values are usually supported in relational databases but usually require storing a special missing data value. The low overhead for missing and multiple values in a directory makes it much easier to accommodate rarely used attributes and occasional exceptions such as persons with multiple telephone numbers. Directories are organized and searched hierarchically. Again it is possible to do this with SQL stored procedures and temporary tables but it is awkward.
  • a directory in many ways is an object oriented database.
  • the difference between directory service and a traditional OODB is that a directory associates attributes with objects but not methods and that binding to the attributes is done at runtime as a lookup operation rather than at compile time.
  • the first means that you can retrieve arbitrary data from an object but the only functions you can perform on it are the search, add, modify, delete etc. defined by LDAP.
  • the latter consideration is similar to the relationship of interpreted BASIC to a compiled higher level languages and with analogous benefits (to the programmer and user) of simplicity, flexibility and rapid development and costs (to the computer) in performance.
  • Frames are a data structure commonly used in artificial intelligence shells. Then- key feature of frames is that they inherit properties from their parents. Directory entries do not do this because objectClasses inherit attributes but not attribute values from their parents. However, this functionality can easily be implemented on the client side.
  • a more flexible scheme would be to define an entry of class aiFrame to include a dn valued attribute aiParentFrame and to trace that. Eventually it might be beneficial to move this to the server side either by defining an LDAP extension or by defining a new ancestor scope option for the search function.
  • URLs Uniform Resource Locators
  • DNS Domain Name System
  • LDAP LDAP server toolkits
  • Standalone servers and LDAP to X.500 gateways are available from several sources.
  • LDAP client libraries are available for the C language from Univ. Michigan and Netscape and for the Java language from Sun and Netscape.
  • LDAP is a standard which is directly utilized by the clients and all clients should be able to talk to all servers.
  • SQL standardization has more to do with transportability of programmers and database schema than interoperability of databases.
  • the X.500 information model is extremely flexible and search filters provide a powerful mechanism for selecting entries, at least as powerful as SQL and probably more powerful than typical OODB.
  • the standard defines an extensibleObject which can have any attribute and some standalone LDAP implementations permit relaxed schema checking, which in effect makes any object extensible. Since an attribute value may be a distinguished name directory entries can make arbitrary references to one another, i.e., across branches of the directory hierarchy or between directories.
  • Some LDAP and X.500 servers 11 permit fine grained access control. That is to say that access controls can be placed on individual entries, whole sub trees (including the directory itself) and even individual attributes if necessary. This level of control is not available in most existing databases.
  • Referrals mean that one server which cannot resolve a request may refer the user to another server or servers which may be able to do so. During a search operation any referrals encountered are returned with the entries located and the user (or client) has the option of continuing the search on the servers indicated. This allows federation of directories which means that multiple LDAP/X.500 servers can present to the user a unified namespace and search results even though they are at widely separated locations and the implementations may actually be very different.
  • JNDI Java Naming and Directory Interface 12
  • Sun Java Naming and Directory Interface 12
  • JNDI removes many of the limitations of LDAP as an OODB by providing a standard way to identify the Java class corresponding to a directory entity and instantiate it at runtime. It is also possible to store serialized Java objects as attribute values. Sun has proposed a set of standard attributes and objectClasses to do this.
  • Monoclonal antibodies are distinguished by cloneName or clone which is unique within the parent entity which must be an investigator or organization.
  • Lymphocyte differentiation antigens a thesaurus of the target specificities of monoclonal antibodies. would include but not be limited to the official CD names.
  • X.500 defines a sparse set of standard types and standard objects mostly for describing persons and documents and more suitable for business than scientific use. However if types were added for scientific use, particularly real numbers and possibly dimensional units, much scientifically relevant info ⁇ nation could be conveniently stored in and accessed from directories. The following minimal set of objects for the field of flow cytometry is presented to lend concreteness to the discussion. A fuller and formal definition will follow.
  • JNDI SPI Java Naming and Directory, Service Provider Interface
  • Sun Microsystems (1998)
  • This feature allows the user to specify the reagents used in a protocol and also to summarizes visually the reagent list for the user.
  • demo it is the upper left panel on the main screen.
  • the "palette” which is a list of individual reagents
  • the "cocktails” which represent combinations of reagents which occur frequently.
  • the reagent palette is populated by copying or referencing entries from a number of sources.
  • the user may have a "bag of tricks" which is a light-weight db of reagents probably serialized into a local file in which they store frequently used reagents and copies of cataloged reagents for use on portables etc.
  • the user may open multiple protocols and copy/paste or drag/drop reagents between protocols.
  • the user may define a new reagent by supplying the required information. An attempt should be made to check for conflicts with the catalog and in most cases to try to catalog the new reagent.
  • the user should be warned of consistency violations but allowed to enter them. It should also be able to be told to accept them without comment in the future (for this protocol).
  • Each column in the table is a sample attribute. It has a string name and a data type which ultimately should be from the same set that JMP uses (or a super set) but string and number would get us started.
  • the user can define new attributes, redefine an existing attribute or copy one or more from other protocols or possibly from the bag of tricks.
  • the user can reorder the columns at will by dragging.
  • Each row in the palette represents one sample
  • the collator allows the user to sort and resort the sample palette as needed and also facilitates logical group selection. It is implemented in the demo by a special "column" which has an icon label no data and a different background.
  • the user can drag the collator like any other column (and can drag other columns over it.
  • the data grid is always sorted by the attributes to the left of the collator in left to right precedence.
  • the background of the cells in the first column are colored with two colors and the color toggles every time the value of that column changes.
  • the second column changes color every time the value in either the first or second column changes and similarly for the others, i.e., in each column a block of color represents all the rows which match at and to the left of the column.
  • Selecting any cell in a column to the left of the collator means selecting all the rows which match in this column and to the left, i.e., the complete color block. It can be copied as such to the experiment modes.
  • this should be a logical definition, i.e., if new entries are made which match the criterion they should be propagated forward but this may be too complicated.
  • the experiment model widget has two side by side panels and you can copy reagents or reagent cocktails into one and samples into the other.
  • the user may change the staining volume (typically lOOul) for the model.
  • the default is a user preference.
  • the user can transpose the layout of the model, i.e., N X M vs M X N grid pattern on the plate, the default being a user preference.
  • the user specifies a target sample and a reagent and then a range of dilutions 2, 4, 8... or
  • the user specifies a reagent cocktail and a large number of samples which are quasi- automatically named.
  • Staining is typically done in "micro-titre" plates which are an 8 X 12 a ⁇ ay of small wells. Other form factors should be available however probably as a user preference. An experiment may require several plates (all of the same form factor). Some users prefer to skip every other row and/or column.
  • the co ⁇ esponding experiment model should be scrolled on screen and highlighted and the sample and reagent cocktail information should be highlighted both in the model and in the palettes.
  • Plate/rack form factor (8 X 12 by default).
  • the desk model is that the user selects a well on the plate map, a dialog with the annotation information for that well is presented and the user can edit the info, (it's not clear what that means in this case since it may come from a database).
  • the user can start, pause, abort or finish collection. Starting collection should start the clock for kinetics data.
  • Protocol editor will then make an entry describing the sample using
  • IBRSS Internet-based Research Support System
  • Phase I we proposed to build the core IBRSS system that will provide the central computing capabilities necessary to receive and catalog data annotation information acquired at remote sites and to move large instrument-generated data sets from remote sites to the central IBRSS site.
  • Phase II we proposed to enable remote users to search the catalog and retrieve data stored in the system.
  • Phase II we propose to complete this system by providing tools for acquiring study and experiment annotation information (protocols), by improving the catalog searching tools to enable searches on additional information, and by implementing tools that will enable launch of third party analysis and viewing software and other tools to facilitate data usage and interpretation.
  • Protocol editors to acquire annotation information Managing data - storing it, finding it and, most importantly, extracting answers to the questions the experiment was meant to answer - requires the ability to combine the data itself with annotation information recorded in study and experiment protocols, which are constructed by the investigator prior to data collection and dictate how the experiment will be done. The complexity of this process is outlined in figure 1, which charts the way information is collected, analyzed and stored in typical biomedical studies.
  • the basic IBRSS protocol elements Analyzed from a systems point of view, the capture of information required to utilize machine-generated data in a typical experiment is conceptually organized into several information capture protocols: 1) study protocols, which capture the hypotheses to be tested and the factors that go into them, including subjects, treatments, experiments and the timeline for an overall study; 2) experiment protocols, which dictate the details of treatments for samples in the experiment; 3) data collection protocols, which specify the samples and reagents that will be put in the test tubes, the planned incubation time and conditions, the specific instruments that will be used for data collection and any instrumentation settings unique to the experiment.
  • study protocols which capture the hypotheses to be tested and the factors that go into them, including subjects, treatments, experiments and the timeline for an overall study
  • experiment protocols which dictate the details of treatments for samples in the experiment
  • data collection protocols which specify the samples and reagents that will be put in the test tubes, the planned incubation time and conditions, the specific instruments that will be used for data collection and any instrumentation settings unique to the experiment.
  • protocols also contain global identifiers for reagents and appended notes concerning anomalies that occurred during sample addition, incubation or data collection; and, 4) analysis protocols, which specify calls for analyses, e.g., to determine subset frequencies, median fluorescences, etc. These protocols may be specified before and/or after data collection and will likely be passed to co-operating third-party analysis software.
  • the Phase II funding requested here will support the development of designs and prototypes for user interfaces (protocol builders) that will enable structured capture of detailed experiment and study descriptions and will provide unique support for experiment planning and data analysis.
  • These Interfaces will be designed to function in a JAVA-based client-server environment (IBRSS) and will support the structured entry of study and experiment information by presenting the user with a broad array of relevant standardized choices to be selected for entry into a standardized set of fields.
  • IBRSS JAVA-based client-server environment
  • we will design the interfaces with associated "Wizards" to guide the user in making choices among options and in structuring the protocol so that it contains controls appropriate to the experiment.
  • the FACS itself simply measures cell-associated fluorescence and light scatter for individual cells passing single file, in a laminar flow stream, past a set of light detectors.
  • the cell-associated fluorescence results from "staining" (incubating) cells with fluorochrome-coupled monoclonal antibodies or other fluorogenic or fluorescent molecules that bind specifically to molecules on, or in, cells.
  • As each cell passes the FACS detectors it is illuminated by a set of lasers that excite the fluorescent molecules associated with the cell. This causes the cell to scatter light and to emit fluorescent light at wavelengths defined by the associated fluorochromes.
  • the amount of light derived from the cell is then measured by the detectors, which are set to measure the light emitted at particular wavelengths or scattered at particular angles.
  • the measurements made by each of the FACS detectors are processed, digitized, joined and recorded on a cell-by-cell basis in a data file that has one such record for each cell analyzed.
  • 4-13 measurements per cell are collected for at least ten thousand, and sometimes up to 5 million cells. This "FACS analysis" usually takes less than a minute and 10-100 samples are typically passed through the FACS in a single session.
  • FACS/Desk users Before collecting FACS data, FACS/Desk users typically file a protocol in which they enter short free-text descriptions of the reagents and cell types used in each sample. This information is displayed during data collection and permanently associated with the data once collected. It is then maintained within FACS/Desk until the user calls for it to be exported, along with the actual data, to analysis/visualization modules. Cooperating analysis modules (e.g., FACS/Desk itself or in the Flowlo software) use this information to label axes on graphs and column heads on tables; IBRSS will use it additionally to catalog the data so that it can be retrieved based on any combination of information included in the protocol.
  • Cooperating analysis modules e.g., FACS/Desk itself or in the Flowlo software
  • IBRSS will use it additionally to catalog the data so that it can be retrieved based on any combination of information included in the protocol.
  • the new protocol builders will collect standardized, rather than free-text, entries wherever possible to make catalog searching more efficient. In addition, they will have modern interfaces (rather than the antique interface in FACS/Desk) and associated Wizards. Thus, as the protocol builders mature and are modified according to user feedback, they will constitute excellent models for the development of protocol builders to serve biomedical instrumentation other than FACS.
  • Experiment model vs data model. Programmers working with complex systems and databases commonly begin by creating a data model and working from that. However, for our purposes here, a data model per se is likely to be too concrete. The following five data model items loosely define a somewhat abstract approach to the experiment model : the first two items give an example of the abstract data model; the next three provide concrete examples of samples that are encountered in the protocol editor. We use FACS studies here as concrete examples. Similar definitions, tailored to other technologies, will be developed as our work proceeds.
  • Attributes In statistics-speak, an attribute is called a "random variable", but this terminology seems only to confuse biologists Attributes have names (usually unique within a restricted framework) An attribute's values may be “nominal”, “ordinal” or “continuous”. It may be an "independent” or “dependent” variable in a model These are hints to the statistics assistant as to how to treat this attribute as a factor in a model.
  • An attribute may be "internal”, “external” or “computed "
  • An internal attribute is created by the protocol editor and stored in the expe ⁇ ment document.
  • External attributes are links to data external files or databases, e.g , JMP tables, SQL databases, or LDAP directories that must be keyed by some attributes of the sample. Some examples in databases in a clinical study include demographic data, vital signs or clinical incidents.
  • Computed attributes are scalar-valued statistics computed from the cell data for a FACS Sample. For completeness, one could make a case for attributes computed from the existing attributes of a sample as well.
  • Databases entries mapping subject id's to patient information and assigning trial arm are special attributes and must be isolated and protected specially. External or computed attributes might be cached for efficiency, but private or blinded information should not be cached.
  • Abstract Sample This is a placeholder that defines things common to the concrete samples defined below.
  • a sample may associate one or more attributes with values (of the appropriate type) and inherits attributes and their values from a super-sample if it has one.
  • a sample may be excluded; if it is, all of its sub-samples are excluded as well Who excluded the sample, and when and why it was excluded should be part of the record. Examples of an exclusion might be a non-compliant patient, a blood draw which was bad, or an instrument malfunction such as a nozzle clog. Excluded samples may be graphed and analyzed using FACS-spe ⁇ fic methods but are normally excluded from meta analysis, i.e., are not exported to JMP etc. for final experiment or study analysis. Obviously one way of handling the exclusions is as a special form of attribute.
  • the value assigned to a subject is essentially an identifier that is unique (at least) to an expe ⁇ ment and may be unique with respect to the study of which the experiment is a part. In addition, it may even be unique globally (e.g., a distinguished name).
  • a subject is technically, that is statistically, a sample from a larger population (say of mice or men) . It may have one or more attributes and may have or require an attribute, e.g., "subject type” for "human", “mouse", “cell line,” etc. but it should be a hint to the "protocol expert" on how to initialize the default and predefined attributes at the interface level, not a polymorphism in the data model.
  • the identifier (or identifiers) must allow for linking the data to external sources but definitely should not include identifying information about human patients Inclusion of such information would subject experiment data to stringent legal requirements for access control and encryption that would interfere with collaboration.
  • This identifier may also be used as a key for blinded data, which is defined in the study data model but not available until after FACS analysis (and the rest of the data collection) is complete.
  • Cell samples are obtained from subjects at a particular time Subjects may be sampled more than once, either by taking multiple samples at a single sitting or (usually) by sampling repeatedly over time Cell samples inherit the attributes of the subject and may add new ones including: time of sampling; the sampled tissue (e.g., peripheral blood, bone marrow) ; how the sample was handled (e.g., ACD, Hepa ⁇ n, put through Ficoll, etc) ; and the sample's role m the study (Screening, 0 week, 2 week, ... 8 week). Cell samples may need to be able to be linked with external data such as vital signs or clinical lab reports, for example, to compute absolute CD4 counts.
  • time of sampling e.g., peripheral blood, bone marrow
  • how the sample was handled e.g., ACD, Hepa ⁇ n, put through Ficoll, etc
  • the sample's role m the study e.g., 0 week, 2 week, ... 8 week.
  • Cell samples may need to be
  • sample copies all the attributes which are factors in the staining model.
  • a new nominal attribute named for the color is added whose value is typically the specificity of the first antibody in the reagent complex of that color This is used later in labeling visualizations of the sample
  • a stained sample has a target cell count and target volume that are needed to compute the pipetting instructions. It must be associated with some coordinate that allows it to be identified to the data collector (currently simply row and column)
  • a technical assistant for an experiment in which samples are to be stained with several sets of reagents, each in a separate test tube, can advise the user as to the minimum number of cells required per sample for the sample to be aliquotted into all of the specified staining tubes.
  • the subjects will be identified by strain and either by animal number or cage number.
  • the user will create a list subjects that includes mouse strain and a mode! that, for example, has crossed attributes such as immunization, treatment or mouse strain that represent the actions and variables in the expe ⁇ ment
  • the user can then assign subjects to the groups defined by the data model or they can request the assistant to distribute the subjects. Since mouse strain appears in both models, the assistant must account for this in making the assignment.
  • the list of cell samples is then the sample model for the protocol.
  • the system will remember that the cross of immunization and treatment and mouse strain is a sub model. Everything up to this point has involved independent variables
  • the data model is important enough and complex enough to warrant explicit definition, perhaps as part of a study
  • patients are identified by an anonymous identifier (nominal) with private and blinded information stored separately. Patients are also stratified into CD4 low and high (nominal) and then divided into glutathione low and high by the median FACS staining value for this parameter within each class computed separately (ordinal).
  • Clinical lab results come in as dBasell file (keyed by patients initials and date). Demographics and vital signs are in commonly in FileMaker databases. Patients are randomized into clinical trial arms (drug vs. placebo) by a third party. Blood is collected at 2-week intervals from 0 to 8 weeks.
  • the sample model for the study is subject crossed with week of visit (ordinal) and CD4 stratum (nominal) and then nested with glutathione, which is ordinal.
  • the sub-model which will be; analyzed statistically, is week of visit crossed with trial arm and CD4 level and then nested with glutathione level.
  • the sample model will be a list of cell samples. At a minimum, the attributes of these samples will include the patient id and the week of visit and will represent an instance of the subject crossed with visit sub-model of the study. Everything so far is again independent variables
  • the user must also prepare (in unspecified fashion) a reagent model of similar structure and possible complexity
  • reagent cocktails there are likely to be a small number of reagent cocktails
  • the reagent model is a single attribute whose value is the name of the reagent cocktail
  • Reagent att ⁇ butes are independent Reagent models may also in rare cases be nested, e g , an isotype experiment performed with allotype reagents.
  • the reagent model is crossed with a sample model (a list of cell samples) in the experiment model to generate a set of Stained samples.
  • FACS sample is defined as the running of a stained sample on a FACS instrument under a specific set of conditions FACS sample inherits attributes from the Stained sample and adds scale information and a start and stop timestamp (locators in the instrument log that allow reconstruction of the instrument state at the time of sampling). Analogous to sub well in the data model, if a Stained sample is FACS sampled more than once, each sampling is treated as a separate FACS sample and given a unique sequence identifier.
  • FACS Data Set A FACS data value for each parameter for each cell in the FACS sample is collected. The values for all cells comprise a FACS data set for a given FACS sample. In addition to the raw cell data, the FACS data set must also include information about the scaling of the cell data It Inherits from the Stained sample attributes, which are used to label the data output graphically, usually the specificity for each color. FACS Data may have computed attributes which make statistical summary information about the cell sample available as an attribute of the data set. • FACS Data Subset. Sometimes a FACS data subset is divided into several pieces, each containing a subset of the cells and the values recorded for those cells. These subsets inherit attributes and may also get a new independent nominal or ordinal attribute in the process. They subsets are treated as samples in their own right and thus may have computed attributes and be subject to meta analysis independently of the total sample.
  • the "protocol builders” capture the annotation information necessary to manage data from studies and experiments During the execution of experiments, this information is initially used to identify the contents of samples during data collection. Next, it is used to retrieve data for analysis and to label analysis output (axes and column heads) with the sample and reagent information necessary for visualizing, interpreting and summarizing results. Finally, it is used to coalesce results from the individual data collections into the results of an experiment, and to coalesce the results of a series of experiments into the findings of a study. Since this crucial interpretive work may occur weeks, months or even years after all data collection for a study is complete, the strength of the annotation and data storage system that supports data collection is critically important to both the quality and the efficiency of scientific studies
  • identifiers for the kinds of protocol fields one finds in FACS (and most other) experiments are generic. However, certain items will be unique to particular studies and will have to be entered directly by users. These items will be entered only once, at the appropriate level, and will be supplied as lists thereafter. For example, study protocol will allow users to enter subject identifiers, which will then appear as selection lists for the experiment protocol generator. Users will choose from this list to identify the, subjects in the particular experiment being planned and will choose from other lists to identify the type of cells in the set of samples to be tested for each subject. The user selections will then be transferred to the sample treatment protocol, where the reagents for each sample and the treatment protocol will once again be specified by selecting from lists of treatments, etc.
  • the collected information will be processed to create an XML file, which will then be transformed by an XSLT style sheet and passed (via a local proxy server) to the central LDAP server Later, this process will be "reversed” and a copy of the information relevant to FACS data collection will be passed to the data collection modules.
  • collection-related information including the location of the data that was collected
  • Protocol information will be used at different points in the data flow, e.g., sample preparation or staining, FACS data collection, FACS data analysis, and experiment or meta analysis
  • samples preparation or staining e.g., sample preparation or staining
  • FACS data collection e.g., FACS data collection
  • FACS data analysis e.g., FACS data analysis
  • experiment or meta analysis e.g., experiment or meta analysis
  • users have a clear concept of these processes, they rarely have a sense of the data flow underlying the experiments they perform. Therefore, even if they were willing to take the time to play "twenty questions" with a protocol builder, they would be unlikely to be able to provide the information necessary to take full advantage of automation to facilitate data collection and analysis. In particular, they would be hard pressed to understand the statistics jargon in which the questions are couched (biologists have enough jargon of their own to hande).
  • the trick here is to structure the user interface such that the easiest way for the user to enter information about the experiment will provide the cues needed concerning the structure of the data. For example, since it is easier to enter two variables that are to be crossed than to fill out a whole table by hand, users can readily be convinced to simply enter the two variables and leave the crossing (filling out the final protocol table) to the technical or statistics assistant This point is illustrated in the example that follows:
  • the subjects will typically be a number of individuals from an inbred strain, i.e., nominally identical, so they don't need to be randomized.
  • the user may want to immunize with protein X, protein Y or nothing and then treat or not treat the animals in some way, e.g., with UV irradiation
  • the user defines three attributes: mouse strain which has values Strain B and Strain C, immunization, which has values X and Y or nothing, and treatment, which has values treated or untreated
  • a technical assistant that provides worksheets for dilutions and cell counts and could identify and/or schedule the appropriate (FACS) instrument to use for data collection. It might use combinatorial methods to identify feasible combinations of available reagents and might then rank them by cost or power (by a process yet to be defined). It could also provide layout assistance and might customize the user interface to deal with different classes of experiments (e.g., mouse vs. human)
  • a Wizard can greatly facilitate the construction of the cocktail by providing a worksheet that keeps track both of the reagent and its color and assures that the desired combinations are reached
  • Instrument control The ScienceXchange model foresees the use of information captured by protocol builders to facilitate data collection, to permanently associate protocol information with data as it is collected, and to pass necessary information to analysis packages for statistical procedures and for labeling axes in graphs and column heads in tables In addition, appropriate information can be displayed for each sample during manual data collection.
  • information entered at the protocol stage can drive the data collection, including specification of analysis parameters (how much, how many, how long) for individual samples and for the whole analysis.
  • analysis parameters how much, how many, how long
  • a Wizard could use information entered at the protocol stage to automatically call for processing (analysis) of FACS data from simple experiments like controls or titrations. Sometimes, this processing would occur as soon as the data is collected; at other times, gating or other information would first have to be obtained from the user.
  • the Wizard might also suggest appropriate ways to visualize specified sub models based on the number, type and cardinality of the various factors.
  • Visualization of the FACS data sets (cell data) associated with FACS sample data sets and sub sets is the second major component of FACS analysis.
  • Visualization tools are used to view FACS data, to define the polygons used to compute the Boolean (gating) functions, to reduce FACS data to interpretable results and to produce publication graphics.
  • axis labels and legend information on the visualized graphics are constructed by associating the color of each FACS measurement (raw or compensated) with the value of the attribute of the same name inherited from the stained sample, e.g., GDI lb labeled with flourescein Scale information is taken come from instrumentation values recorded by the collector; other values may come from other components of the system.
  • Visualization may be used during data collection, analysis or even during construction of the protocol
  • FlowJo TeleStar, Inc , San Mateo, California
  • CellQuest Becton-Dickmson Biosystems, Milpitas, California
  • FlowJo outputs tables of computed data (mean fluorescence, frequency, etc.) for various subsets that the user identified. At present, these tables can be imported into Excel for further processing. Alternatively, many users import them into JMP, a statistical discovery program developed and marketed by the SAS Institute (Cary, North Carolina; see section 4 below).
  • ScienceXchange Wizards were to manage the data export to FlowJo (or other computation packages) and were to accept FlowJo output, which could then be sent to JMP (or other statistics packages) along with the clinical lab values or other values necessary to obtain the final analysis results (which, after all, are what the user is looking for).
  • the statistics expert deduces that this sub model has independent crossed nominal variables (mouse strain, immunization and treatment) and a continuous dependant variable (the median CD999 fluorescence), This allows the expert to configure the JMP ANOVA platform to test the hypothesis that the treatment or the priming or both had some effect on that population (increasing or decreasing median CD999 fluorescence). Selecting both medians would configure a MANOVA platform. Selecting an independent time variable might launch a time series specific platform, etc. These platforms are available in callable statistics packages such as JMP, which also generate graphical output. Selecting out the data from BALB mouse strain, rather than all mice in the experiment, produces a sub model with immunization crossed with treatment as the sub model and the same dependant variables.
  • protocol information it is useful to pass protocol information to the data collection module to inform data collection and assure that the collected data is properly associated with the protocol and study information to facilitate analysis.
  • This process can be made to operate without cooperation from instrument manufacturers provided that users intervene to associate the data file collected for a given sample with the protocol information for that sample.
  • instrument manufacturers we intend to seek cooperation with instrument manufacturers to integrate data collection more closely with ScienceXchange capabilities.
  • ScienceXchange will have to either create analysis and visualization packages capable of utilizing this information or arrange cooperative development with third party software vendors who want to capitalize on the market that these capabilities address.
  • Our experience to date suggests that vendors will readily be found for this purpose.
  • a path has already been developed that enables passage of protocol information (axis labels, etc.) to FlowJo and discussions are in progress to enable passage of additional information and acquisition of FlowJo output into ScienceXchange.
  • our discussions with Becton-Dickinson concerning data acquisition will also extend to developing an interactive route for work with their analysis package (CellQuest).
  • the IBRSS technology is based on prototype server software to be licensed from Stanford.
  • the Seaberg laboratory has hosted this development and will continue to host further server development, including establishment of the IBRSS alpha test vers ⁇ on(s).
  • the IBRSS alpha test site will be the Herzenberg laboratory (Genetics Department, Stanford University School of Medicine), where FACS/Desk and the prototype for IBRSS was developed and where the initial IBRSS developer, Wayne Moore, is still employed as the senior software engineer.
  • the first two beta sites will be located at Fox Chase Cancer Center (under Richard (Randy) Hardy's direction) and at the University of Iowa School of Medicine sites (under Morns Dailey's direction). These sites were chose because they have a currently operating FACS/Desk installation.
  • the Stanford site has users who have pioneered the use of various FACS/Desk capabilities and provided the alpha test site for FlowJo software, which was partially developed under Wayne Moore's supervision before migrating out into the commercial world. Investigators in the Seaberg laboratory are thus trained to report bugs, find workarounds and generally co-exist with alpha level software. They are anxious to move to the IBRSS system despite this experience and look forward to commercial alpha support rather than the developer support that has been available to date.
  • Coda LDAP directories in the service of biomedical studies
  • XSLT style sheets were developed to provide the information for rendering XML documents for viewing in browsers. However, recognizing that this transformation process is not restricted to rendering documents for viewing, ScienceXchange is putting it to unique uses in the scientific arena.
  • the stylesheet transformation language defines the transformation of the original input (XML) document to "formatting objects" such as those included in HTML documents. In a traditional style sheet, these are then rendered for viewing.
  • the XSLT transformation grammar can also be used to transform XML documents from one form to another, as in the following examples: a) Loading directories.
  • XSLT can be used to transform an XML file generated by any data processing application to an XML representation of a directory (sub)tree, i.e., to extracting directories entries from the XML document.
  • the ability to use XSLT for this transformation greatly simplifies the creation and maintenance of LDAP or other directories that serve diverse information derived from distinct sources (e.g, FACS instruments and genome data banks) that generate different types of XML documents.
  • using XSLT removes the necessity for writing distinct Java code to construct the directory entries for each type of document. Instead, appropriate "directory styles" can be defined for each document type and a single Java program can be written to process all XSL-transformed documents into the directory tree (see figure).
  • XSLT can be used to transform a subset of the information in an XML file so that it can be read by a program that takes XML input in a particular format
  • XSLT can launch the program and pass the result of the transformation during the launch
  • XSLT stylesheets we can launch an analysis application by transforming an XML file containing the results of a directory search to an application-readable file containing URLs for the data and appropriate annotation information for the analysis. This option can be made available for all co-operating applications and need not be restricted to FACS data.
  • XSLT style sheets can be used to change the form of a document. For example, they can be used to extract the results of analyses and display them as values in the rows or columns of a table
  • ScienceXchange will develop a reliable, large scale (terabyte level), web accessible, central storage system coupled with small-scale volatile storage deployed locally in a manner transparent to the user.
  • This system will store data and annotation information transmitted from the data collection system.
  • it will catalog the stored data according to selected elements of the structured annotation information and will retain all catalog and annotation information in a searchable format.
  • ScienceXchange will use industry standard formats for storing data and annotation information. If no standard is available, ScienceXchange will publish the interim formats that are used and provide translators to industry standards that become available,
  • Federated directory and data storage - ScienceXchange will capitalize on the built-in replication and referral mechanisms that allow search and retrieval from federated LDAP networks in which information can be automatically replicated, distributed, updated and maintained at strategic locations throughout the Internet.
  • pointers to raw data in LDAP are URLs to data store(s)
  • LDAP provides fine-grained security controls that give the individual user control over individual elements that will be exposed or hidden.
  • the overall issue of security needs to be considered from an Internet perspective. For example, we are currently grappling with the following : should the data be encrypted or the server or only on the wire 7 do we need to require (or allow) secure sockets for most operations '? what sort of digital signatures, message digest and cryptography algorithms should we use ?
  • Retrieval of data for analysis - ScienceXchange will enable retrieval of annotated data sets and transfer to visualization and analysis programs that can use the annotation information to label analysis output, facilitate data interpretation and enable return, storage and retrieval of analysis output within the context of the study and experiment that generated the primary data.
  • a) retrieve annotated data sets (subject to owner-defined accessibility) via catalog browsing and/or structured searches of the catalog; automatically verify authenticity of the data based on the digital signature.
  • Facilities in this program could provide the following : a) Repository for primary data abstracted in publications - a resource to enable direct access to the primary data upon which display items (tables, graphs) in publications are based. The Federal government is considering mandating such access to primary data. b) Library of cell surface expression patterns for types and stages of disease - a resource to enable researchers and clinicians to facilitate diagnoses and definitions of new conditions by comparing with locally acquired FACS and other data with resource data acquired from characterized subjects. c) Data source for science education projects - a resource to provide science educators at all levels with standardized data that can be used to teach analysis, data interpretation and diagnosis methods, In addition, it will provide material for student research projects and for examinations.
  • each protocol generator will have to be able to 1) present lists of standardized choices that collectively enabled intake of the annotation information necessary for the study; 2) record user selections; 3) provide "type-in" capabilities for items that are not amenable to listing; and, 4) provide the ability to transfer the acquired annotation information to the archive index or to a central "information distributor" in the overall system, .e.g, for transfer of the relevant components to a data collection module that provides access to certain annotation information during data collection.
  • Phase II The code produced to meet Phase II goals need not be fully optimized but must be stable enough for beta testing and thus must allow repeated use without crashing. Further, mechanisms for selecting reagents and other types of standardized annotation produced in Phase II must be fully operative but need not provide a complete range of options. Year I will be devoted to determining as many of these options as are deemed useful by the restricted group of alpha testers (Herzenberg laboratory scientists) with whom we will work during this Phase. This list will be extended during year II as the beta test process brings us into contact with a substantially broader group of investigators.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Economics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention se rapporte à des bases de données et à l'échange d'informations scientifiques. Elle se rapporte plus particulièrement à une base de données unifiée qui permet à des chercheurs de partager facilement leurs données avec d'autres chercheurs. Le procédé de l'invention facilite également la collecte, l'annotation, le stockage, la gestion, l'extraction et l'analyse de données scientifiques par le biais et à l'intérieur de la base de données. Il permet l'archivage et l'extraction directs des données recueillies depuis des appareils de laboratoire, assurant ainsi la cohérence des données à des fins de demande de brevet ou autres. Il facilite en outre le partage de données entre des laboratoires situés dans des lieux éloignés. Le procédé de l'invention autorise enfin la création automatisée de protocoles expérimentaux.
PCT/US2001/016375 2000-05-19 2001-05-18 Systeme a acces internet permettant le stockage, l'extraction et l'analyse de donnees fondes sur un protocole de repertoire WO2001090951A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001263335A AU2001263335A1 (en) 2000-05-19 2001-05-18 An internet-linked system for directory protocol based data storage, retrieval and analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20548900P 2000-05-19 2000-05-19
US60/205,489 2000-05-19

Publications (3)

Publication Number Publication Date
WO2001090951A2 true WO2001090951A2 (fr) 2001-11-29
WO2001090951A9 WO2001090951A9 (fr) 2003-09-18
WO2001090951A3 WO2001090951A3 (fr) 2004-08-05

Family

ID=22762394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/016375 WO2001090951A2 (fr) 2000-05-19 2001-05-18 Systeme a acces internet permettant le stockage, l'extraction et l'analyse de donnees fondes sur un protocole de repertoire

Country Status (2)

Country Link
AU (1) AU2001263335A1 (fr)
WO (1) WO2001090951A2 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611817B1 (en) 1999-06-17 2003-08-26 International Business Machines Corporation Automated technique for code generation of datastream mappings
WO2003102854A2 (fr) * 2002-06-04 2003-12-11 Applera Corporation Systeme et procede de commande en circuit ouvert et de suivi d'instruments biologiques
EP1461674A2 (fr) * 2001-12-03 2004-09-29 Dyax Corporation Criblage de banques
WO2005020123A2 (fr) * 2003-08-15 2005-03-03 Applera Corporation Systeme d'information pour recherche en biologie et sciences de la vie
EP1538537A2 (fr) * 2003-12-01 2005-06-08 Accelrys Software, Inc. Méthode de stockage de données d'expérimentation à haut débit dans une base de données
WO2005062207A1 (fr) * 2003-12-22 2005-07-07 Salvatore Pappalardo Procede evolue de recherche, de redaction et d'edition de fichiers electroniques
US6947953B2 (en) 1999-11-05 2005-09-20 The Board Of Trustees Of The Leland Stanford Junior University Internet-linked system for directory protocol based data storage, retrieval and analysis
EP1852815A1 (fr) * 2006-05-05 2007-11-07 Lockheed Martin Corporation Systèmes et procédés pour contrôler l'accès aux registres électroniques dans un système d'archive
US20090063259A1 (en) * 2003-08-15 2009-03-05 Ramin Cyrus Information system for biological and life sciences research
US7555492B2 (en) 1999-11-05 2009-06-30 The Board Of Trustees At The Leland Stanford Junior University System and method for internet-accessible tools and knowledge base for protocol design, metadata capture and laboratory experiment management
US9026371B2 (en) 2003-09-19 2015-05-05 Applied Biosystems, Llc Method for cross-instrument comparison of gene expression data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000028437A1 (fr) * 1998-11-06 2000-05-18 Lumen Stockage des donnees fonde sur le protocole de repertoire

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000028437A1 (fr) * 1998-11-06 2000-05-18 Lumen Stockage des donnees fonde sur le protocole de repertoire

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BEN-BASSAT M ET AL: "A Hierarchical Modular Design for Treatment Protocols" DIALOG EMBASE, 1980, XP002913397 *
GORDON GOOD: "The LDAP Data Interchange Format (LDIF) - Technical Specification <draft-good-ldap-ldif-05.txt>" IETF DRAFT, [Online] 19 October 1999 (1999-10-19), pages 1-16, XP002267329 Retrieved from the Internet: URL:http://www.watersprings.org/pub/id/dra ft-good-ldap-ldif-05.txt> [retrieved on 2004-01-19] *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611817B1 (en) 1999-06-17 2003-08-26 International Business Machines Corporation Automated technique for code generation of datastream mappings
US7555492B2 (en) 1999-11-05 2009-06-30 The Board Of Trustees At The Leland Stanford Junior University System and method for internet-accessible tools and knowledge base for protocol design, metadata capture and laboratory experiment management
US6947953B2 (en) 1999-11-05 2005-09-20 The Board Of Trustees Of The Leland Stanford Junior University Internet-linked system for directory protocol based data storage, retrieval and analysis
EP1461674A2 (fr) * 2001-12-03 2004-09-29 Dyax Corporation Criblage de banques
EP1461674A4 (fr) * 2001-12-03 2007-10-24 Dyax Corp Criblage de banques
WO2003102854A2 (fr) * 2002-06-04 2003-12-11 Applera Corporation Systeme et procede de commande en circuit ouvert et de suivi d'instruments biologiques
WO2003102854A3 (fr) * 2002-06-04 2004-07-22 Applera Corp Systeme et procede de commande en circuit ouvert et de suivi d'instruments biologiques
US7491367B2 (en) 2002-06-04 2009-02-17 Applera Corporation System and method for providing a standardized state interface for instrumentation
US6909974B2 (en) 2002-06-04 2005-06-21 Applera Corporation System and method for discovery of biological instruments
US7379823B2 (en) 2002-06-04 2008-05-27 Applera Corporation System and method for discovery of biological instruments
US7379821B2 (en) 2002-06-04 2008-05-27 Applera Corporation System and method for open control and monitoring of biological instruments
US7680605B2 (en) 2002-06-04 2010-03-16 Applied Biosystems, Llc System and method for discovery of biological instruments
WO2005020123A2 (fr) * 2003-08-15 2005-03-03 Applera Corporation Systeme d'information pour recherche en biologie et sciences de la vie
WO2005020123A3 (fr) * 2003-08-15 2005-09-09 Applera Corp Systeme d'information pour recherche en biologie et sciences de la vie
US20090063259A1 (en) * 2003-08-15 2009-03-05 Ramin Cyrus Information system for biological and life sciences research
US9026371B2 (en) 2003-09-19 2015-05-05 Applied Biosystems, Llc Method for cross-instrument comparison of gene expression data
EP1538537A2 (fr) * 2003-12-01 2005-06-08 Accelrys Software, Inc. Méthode de stockage de données d'expérimentation à haut débit dans une base de données
EP1538537A3 (fr) * 2003-12-01 2006-11-02 Accelrys Software, Inc. Méthode de stockage de données d'expérimentation à haut débit dans une base de données
WO2005062207A1 (fr) * 2003-12-22 2005-07-07 Salvatore Pappalardo Procede evolue de recherche, de redaction et d'edition de fichiers electroniques
EP1852815A1 (fr) * 2006-05-05 2007-11-07 Lockheed Martin Corporation Systèmes et procédés pour contrôler l'accès aux registres électroniques dans un système d'archive
US8726351B2 (en) 2006-05-05 2014-05-13 Lockheed Martin Corporation Systems and methods for controlling access to electronic records in an archives system

Also Published As

Publication number Publication date
AU2001263335A1 (en) 2001-12-03
WO2001090951A3 (fr) 2004-08-05
WO2001090951A9 (fr) 2003-09-18

Similar Documents

Publication Publication Date Title
US6947953B2 (en) Internet-linked system for directory protocol based data storage, retrieval and analysis
US7555492B2 (en) System and method for internet-accessible tools and knowledge base for protocol design, metadata capture and laboratory experiment management
Lacroix et al. Bioinformatics: managing scientific data
Kotecha et al. Web‐based analysis and publication of flow cytometry experiments
US6675166B2 (en) Integrated multidimensional database
US20020178185A1 (en) Database model, tools and methods for organizing information across external information objects
US20030233365A1 (en) System and method for semantics driven data processing
US20050021877A1 (en) Information management system for managing workflows
Shaker et al. The biomediator system as a tool for integrating biologic databases on the web
WO2001090951A2 (fr) Systeme a acces internet permettant le stockage, l&#39;extraction et l&#39;analyse de donnees fondes sur un protocole de repertoire
Musen et al. Modeling community standards for metadata as templates makes data FAIR
Charbonneau et al. Making Common Fund data more findable: catalyzing a data ecosystem
Bandrowski et al. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
WO2000028437A1 (fr) Stockage des donnees fonde sur le protocole de repertoire
Arefolov et al. Implementation of the FAIR data principles for exploratory biomarker data from clinical trials
Husser et al. Standardization of microarray and pharmacogenomics data
WO2004097585A2 (fr) Systeme et procede relatifs a des outils et a une base de connaissances accessibles par internet pour conception de protocoles, saisie de metadonnees et gestion d&#39;experiences de laboratoire
Heras Integrating constitutional cytogenetic test result reports into electronic health records
Conley [37] Internet information on ion channels: Issues of access and organization
Premanand StocksDB: Design and Development of a Database Application for the Management of Biological Stocks
Bechini et al. Management of genotyping-related documents by integrated use of semantic tagging
Blinov et al. Modeling without borders: creating and annotating VCell Models using the web
Pitzl Design and development of a database and retieval system for research in cellular aging
Grant A Microarray Database
JP2003526133A6 (ja) 発現データ・マイニング・データベースおよび実験室情報管理を提供する方法および装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
COP Corrected version of pamphlet

Free format text: PAGES 1-60, DESCRIPTION, REPLACED BY NEW PAGES 1-77; PAGES 61-66, CLAIMS, REPLACED BY NEW PAGES 78-83; PAGES 1/5-5/5, DRAWINGS, REPLACED BY NEW PAGES 1/5-5/5; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

NENP Non-entry into the national phase

Ref country code: JP