WO1996023078A1 - Memorisation et analyse informatiques de donnees microbiologiques - Google Patents

Memorisation et analyse informatiques de donnees microbiologiques Download PDF

Info

Publication number
WO1996023078A1
WO1996023078A1 PCT/US1995/012429 US9512429W WO9623078A1 WO 1996023078 A1 WO1996023078 A1 WO 1996023078A1 US 9512429 W US9512429 W US 9512429W WO 9623078 A1 WO9623078 A1 WO 9623078A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
storing
tables
attribute
Prior art date
Application number
PCT/US1995/012429
Other languages
English (en)
Inventor
Jeffrey J. Seilhamer
Angelo Delegeane
Randal W. Scott
Original Assignee
Incyte Pharmaceuticals, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/US1995/001160 external-priority patent/WO1995020681A1/fr
Application filed by Incyte Pharmaceuticals, Inc. filed Critical Incyte Pharmaceuticals, Inc.
Priority to NZ294720A priority Critical patent/NZ294720A/en
Priority to AU37590/95A priority patent/AU692626B2/en
Priority to JP8522835A priority patent/JPH11501741A/ja
Priority to EP95935663A priority patent/EP0805874A4/fr
Publication of WO1996023078A1 publication Critical patent/WO1996023078A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations

Definitions

  • This invention relates to computer database technology applied to genetic data and corresponding cell information. More specifically, a relational database system that stores DNA sequences, the corresponding source data, and other related scientific data is disclosed.
  • a relational database can be characterized as a system for storing data represented as a plurality of tables.
  • a row of each table also referred to as a tuple, represents a record of information.
  • a column is essentially a collection of values for the same field of the stored records.
  • Each column is also referred to as an attribute of the stored records.
  • each record in a given table of a relational database includes a set of fields that correspond to the attributes of the table.
  • a set of all the values from which the actual values of an attribute can be drawn is referred to as a domain.
  • a crucial feature of relational data structure is that associations between tuples (rows) are represented solely by data values in columns drawn from a common domain.”
  • a relational database for storing biological information.
  • the relational database is organized as a collection of tables each of which stores specific records of biological information.
  • the records are interrelated so that each table includes a column which is common with at least one other table.
  • the database contains cDNA sequencing data and corresponding match logs indicating the correlation between presently identified cDNA sequences and previously known sequences.
  • a variety of tables of the database store historical data related to identification of a particular cDNA sequence.
  • Such tables include the identification of the biological source; cell culture and treatment data; mRNA preparation data; cDNA construction data; clone preparation data including tables for inoculation, preparation, fluorometer data, and excision.
  • the interrelated information in the database enables the design of various queries useful in scientific analysis and other applications. For example, such functions as abundance analysis which allows one to determine the frequency with which an RNA transcript appears within a certain source tissue can be performed using database of the preferred embodiment. Other analytical results that have previously been obtained using laboratory chemical techniques can be determined using database queries. One such application is subtraction analysis. BRIEF DESCRIPTION OF THE DRAWINGS
  • Fig. l symbolically depicts an overall architecture of the system of the preferred embodiment of the present invention.
  • Fig. 2 is a flowchart symbolically depicting the process of cloning and sequencing cDNAs.
  • Figs. 3A, 3B and 4-10 illustrate portions of the biological relational database of the preferred embodiment of the present invention.
  • Fig. 11 illustrates an example of the output of an abundance analysis query of the relational database of the preferred embodiment.
  • Fig. 12 illustrates an example of the output of a subtraction analysis query using the database of the preferred embodiment.
  • a system for storing, tracking and manipulating the genetic data is organized as a relational database.
  • the users of the system at their workstations (6 and 7) can access one or more relational databases via an integrated Ethernet network 5.
  • the workstations (6, 7) are typically personal computers known in the art that usually include data entry means, output devices, display, CPU, memory (RAM and ROM) and interfaces to network 5.
  • Database storage l illustrates the database of the preferred embodiment of the present invention, which is stored at a file server connected to network 5.
  • Reference databases 3 illustrate sources of data which, for example, may be searched as part of the use of database l.
  • databases may, for example, include other sequence, nucleic acid, protein, and motif databases.
  • each cell in an organism such as the human body, contains a complete set of genes or genetic information. These genes are either active or inactive at different times in the cell's life cycle. Some genes are active in all cells and are necessary for normal and common functions, or housekeeping duties. Other genes are only active in a particular cell type, because they specify and regulate functions peculiar to a tissue or an organ under normal conditions.
  • genes which are activated only in response to stress or disease Some stress genes, which activate in several cell types, respond to the general alarm. Other stress genes are very specific and only activate in a particular cell type. Thus genes can be grouped into very small and specific subsets or subsets of varying, larger sizes. The classification and understanding of these nested sets of genes are important in the diagnosis and treatment of disease.
  • Genes, or double-stranded deoxyribonucleic acid (DNA) are activated by the transcription or copying of the sense strand of the DNA molecule into single-stranded messenger ribonucleic acid (mRNA) .
  • mRNA messenger ribonucleic acid
  • the message inherent in the mRNA sequence is subsequently translated into amino acids, the molecular building blocks of the polypeptides or proteins that function structurally or enzymatically in the cell.
  • the activities taking place at any one time and the relative importance of those activities are reflected in the numbers of mRNA molecules found in the cell.
  • Some RNAs (housekeeping) are always present, and their numbers remain fairly stable in normal cells of any tissue.
  • actin represent and carry out the constant background activity essential to most cell types (the exception to this case is a mature, differentiated red blood cell which lacks DNA but has a set of mRNAs or enzymes which function for the remainder of its life) .
  • the RNAs (routine) which carry out the duties of a particular cell type are only activated in that cell type, and the numbers of routine mRNAs will be stable under normal conditions. If that particular cell type is stressed or exposed to disease, the numbers of routine mRNAs fluctuate as genes which respond to the stress/disease are activated. These stress/disease mRNAs have priority over other routine or housekeeping mRNAs, and they quickly increase in number.
  • the housekeeping genes of brain cells and liver cells are shared; cells from both organs transcribe the mRNAs that produce the enzymes required to process incoming molecules of glucose.
  • the mRNAs that make proteins for the normal functions of a pituitary cell are different from the mRNAs of a liver Kupffer cell although each is functioning normally.
  • the set of mRNAs from a diseased liver cell differ from those from a normal liver cell. In each case, a different and diverse subset of mRNAs characterizes the cell in a particular situation at a particular time.
  • the database of the preferred embodiment provides the storage, manipulation, and retrieval of the information which relates to the classification and characterization of unique populations of mRNAs. On the basis of this information, scientists can diagnose diseases and design specific treatments. The wealth of detailed information provides clues to earlier diagnosis and treatment which contribute to rapid healing and help avoid permanent impairment or death.
  • the database system of the present invention takes advantage of the powerful capabilities of modern computers by storing genetic information in association with a large amount of related information. More specifically, in the preferred embodiment, the information on essentially all the steps of obtaining tissue, extracting transcripts, cloning, and identifying cDNA sequences is stored in various relational tables. Thus, the database of the present invention allows one to backtrack through the steps performed in the laboratory in identifying the cDNA sequence.
  • Fig. 2 illustrates the steps of preparing genetic data stored in the database of the present invention.
  • the information associated with the steps of Fig. 2 is stored in the database as tables depicted in Figs. 3A, 3B and 4 through 10.
  • the first step 10 is cell preparation.
  • Cell preparation 10 includes the steps of obtaining and growing the cells so as to prepare them for RNA extraction.
  • step 20 indicates the processes associated with extracting mRNA from the cell.
  • the mRNA becomes cDNA.
  • the cDNA fragment can be received from an outside source or collaborator without performing steps 10 and 20.
  • the cDNA molecule is obtained, it is cloned at step 40 and sequenced at step 50.
  • the sequence that is obtained at step 50 is then compared at step 60 to known sequences on the genetic database.
  • the function of the DNA sequence is determined at step 70.
  • Figs. 3A, 3B and 4-10 schematically illustrate the tables of the database of the preferred embodiment.
  • Exemplary fields (or attributes) are depicted within each box, and each table includes an attribute having a domain which is common to at least one other table.
  • the common domain is bio_source_ID.
  • Arrow 135, one end of which is labelled “l” and the other end is labelled "M”, indicates that for each one tuple in the Biological Source table there may be more than one tuple in the Cell Culture/Treatment table.
  • the data received and obtained in steps 10-30 of Fig. 2 is stored in the Library Preparation portion of the database of the present invention (Figs. 3A and 3B) .
  • This data includes information relating to the biological source of the cells used to obtain the cDNA (boxes 130, 110, 120), cell culture and treatment (boxes 140, 180) , mRNA preparation (box 150) and cDNA construction (boxes 170, 160) .
  • box 130 depicts the table for storing the biological source information.
  • the source may be cells grown in tissue culture or cells obtained during surgery from a single individual or a pooled sample, e.g., pituitary glands obtained from patients of both sexes and a range of ages.
  • the biological source table 130 contains attributes as depicted in Fig. 3A, such as tissue, organ, gender, age, pathology, etc.
  • the biological source may reflect a normal, treated or diseased state.
  • a person skilled in the art will realize that, if desirable, certain other biological source information can be stored; and on the basis of this disclosure, such person will be able to include other relevant attributes if desired.
  • the data regarding the collaborators, i.e contributors of a biological source, is stored in table 110 as depicted in Fig. 3A, and the information regarding the cell suppliers contributing to biological sources is stored in table 120.
  • the source_ID attribute of the biological source table 130 corresponds to either collaborator_ID or supplier_ID of tables 110 and 120 respectively.
  • Part of the cell preparation procedure includes the cell culture and treatment process.
  • Cell culture is carried out in containers of known size or volume. Density is usually reported as cells per milliliter (of liquid media) and is monitored to maintain a healthy cell culture. Density at the time cells are harvested may be measured either as cell number or as grams per liter. Treatment may vary. Induction with a chemical can change a cell from an immature form, monocyte, to a mature one, macrophage. Stimulation or activation with a different chemical causes the macrophage to ingest and digest invading bacteria.
  • a cell culture is split into two or more parts, with one subsample maintained in its normal growth mode (as the biological control) and other subsample(s) subjected to activation and/or stimulation.
  • a subsample of control cells is compared with a subsample of cells treated with a drug candidate.
  • Drug doses and length of treatment may vary.
  • the cell culture and treatment information is stored in table 140 in Fig. 3A.
  • the attributes of the cell culture/treatment table 140 of the preferred embodiment are listed in the table 140. These attributes include such information as cell density, cell quantity, and treatment.
  • the cell culture/treatment table 140 has the attribute bio_source_ID in common with table 130.
  • Step 20 of mRNA preparation begins with the extraction of total ribonucleic acid (RNA) from cells of a known weight or volume according to a standard protocol. The protocol and any modifications are recorded. The extracted RNA is optionally fractionated to recover the messenger or transcript RNA (mRNA) ,- if it is fractionated then yield is calculated as a percent (mRNA/total RNA) .
  • RNA messenger or transcript RNA
  • the normal function of mRNAs in the cell is to produce peptides or proteins.
  • Spectrophotometry and gel appearance are used to check the quality of the mRNA.
  • an optical density readout of 1.8 derived from a 260 lambda/280 lambda ratio, indicates high quality RNA, not unduly contaminated with DNA or proteins.
  • a subsample of this mRNA is checked further by moving it via electric current (electrophoresis) through an agarose gel.
  • the gel is examined visually for contaminating DNA, which generally moves with higher molecular weight substances than the RNA, or for degraded mRNA, which forms a fuzzy rather than a sharp band or signal.
  • the data related to the mRNA preparation is stored in table 150 in Fig. 3B.
  • Table 150 has an attribute mRNA_source_ID, which correlates with either attribute culture_ID of table 140 or attribute Bio_source_ID of table 130, and an attribute mRNA source, which identifies the table with which mRNA_source_ID correlates. These two attributes in combination, therefore, link records of table 150 to tables 140 and table 130.
  • a cDNA sequence is derived from the mRNA.
  • the cDNA construction requires the conversion of mRNA into complementary DNA (cDNA) preferably using oligo DT, random priming, reverse transcription or other protocols, as known in the art.
  • Useful cloning sites are designed into the bacteriophage into which the DNA is packaged or incorporated. Packaging or plating efficiency is determined by examining the number of primary plaques, i.e., individual bacterial colonies, which resulted from a particular experiment. Information is recorded about the genetic background of host bacterium and the titer of the bacteriophage, before and after amplification.
  • the quality of the library is determined by screening for the actin gene, present in all normal or diseased cell types, and estimation of the size of the cDNA fragment which has been inserted (insert size) .
  • the data related to the cDNA construction is stored in table 170 in Fig. 3B.
  • the attributes of this table depicted in Fig. 3B provide detailed information about the cDNA construction.
  • tables 170 and 150 have a common attribute mRNA_prep_ID.
  • Preprocessed cDNA fragments can be purchased from an outside supplier or obtained from a collaborator or customer. In such a case, the relevant data is stored in the cDNA supplier table 160 is stored in the database.
  • the Table 160 has the attribute supplier_ID which is also a part of the cDNA construction table 170.
  • the portion of the database depicted in Fig. 4 relates to the clone preparation data that is obtained during the cloning process and includes information relating to excision (box 190) , inoculation (box 200) , preparation (box 210) , fluorometer (boxes 220, 230, 240). Cloning includes the steps of excision, inoculation and preparation.
  • Excision is the removal of the cDNA fragment from the vector. This follows an overnight cultivation and induced amplification of the vector in the SOLR bacterial host cells which comprise each culture.
  • the plasmid DNA is separated from the bacterial DNA and quantitated fluorometrically before sequencing.
  • the table that stores data related to excision is illustrated as 190 in Fig. 4.
  • the excision table 190 has an attribute cDNA_const_ID in common with cDNA construction table 170.
  • Inoculation involves growing up or increasing the number of bacteria in a liquid growth medium. As soon as the required cell density (optimum growth) is reached, the culture is plated (streaked or spread thinly) on solid growth media. Individual colonies which arise on the surface of this solid media may be subcultured in tubes or microtiter plate wells of liquid media as pure cultures. The collection of bacterial cultures corresponds to the numbers and type of genes which were active in the source tissue.
  • the data that relates to inoculation is stored in the table illustrated as 200.
  • the attribute plating_ID of the table 200 is common with the same attribute in the table 190.
  • Fluorometers are used to quantitate the cDNA in nanograms or micrograms per microliter. The total amount of cDNA must be determined to calculate the amount which will be processed and separated electrophoretically in any particular lane of a sequencing gel. The remainder of the sample is stored for future use. Fluorimetry procedures determine cDNA purity and help predict performance in subsequent procedures.
  • the fluorometer information is stored in the tables illustrated as 220, 230, and 240. More specifically, the data from the fluorometer analysis is stored as the attributes of fluorometer log table 220.
  • Table 230 (Fluorometer) stores the information regarding the instrument and, as illustrated in Fig. 4, has an attribute fluorometer_ID in common with the Table 220.
  • the fluorometer calibration table 240 is associated with the fluorometer table 230 via a common calibration_ID attribute.
  • the cDNAs are prepared for sequencing. Preparation of the cDNAs for sequencing is recorded along with the methods (and their modifications) used at that time. The scientists (SWAT) troubleshoot the sequencing process and track the results of their custom protocols.
  • the preparation table is illustrated as 210.
  • Table 250 clone log, combines the information regarding the cloning process as illustrated in Fig. 4. In particular, it contains an attribute Inoculation_ID which is also an attribute of the inoculation table 200. An attribute clone_ID is shared with the fluorometer log table 220. An attribute Preparation_ID is also a part of the preparation table 210.
  • the dead_or_alive attribute of the clone log table 250 for example, identifies dead clones in which the plasmid preparation did not yield enough DNA to sequence.
  • the data related to the process of sequencing is stored as depicted in the sequencing portion of the database illustrated in Fig. 5.
  • This portion includes information relating to specifications of the sequence and related information. It includes the sequencing log (box 300) the sequencing gel (box 280) , the reaction set (box 270) and the sequence archive (box 290) .
  • the specification of the sequence and related information are stored as attributes in sequencing log table 300. It should be noted that a clone can be sequenced multiple times.
  • Table 260 (sequencing link) links the clone log table 250 with the sequencing log table 300.
  • the sequencing link table 260 contains a clone_ID attribute, which is in common with the same attribute in the clone log table 250 and a sequencing_log_ID attribute which is also included in the table 300.
  • Sequencing of the cDNAs is performed on an automated ABI system.
  • the sequencing gel is evaluated for the sharpness and darkness of the signal which each of the deoxyribonucleotides or bases (adenine, cytosine, guanine, and thymidine) display, their physical proximity to one another in the gel, and the clarity of the gel background. These characteristics must fall within certain parameters for the automatic gel reader to produce a sequence.
  • An electronic chromatogram, or gel representation is stored in the computer system for future reference.
  • the tracking of all gel information is reflected by a gel key.
  • the gel, the conditions under which it was run, the time required for the gel run, the individual machine/instrument used, staff and biological preparation are recorded whether or not a usable sequence is obtained.
  • This data is stored in the gel key table 280 which has an attribute Gel_key_ID which is common with the same attribute in the sequence log table 300.
  • the biological preparation, which runs on the sequencing gel, is referred to as the reaction set.
  • the Catalyst is the Model 800 Molecular Biology Station in which robots perform amplifications, PCRs, dilutions and additions of fluorescent dyes to the cDNAs.
  • the data related to the reaction set is stored in table 270. This table has an attribute entitled Reaction_Set_ID which is also part of the sequence log table 300.
  • sequence archive is activated if a sequence is obtained.
  • the sequence is rated as normal or variant and evaluated for usefulness and subsequent storage in the computer system database. Variant sequences identified at this time may be designated express (see discussion below) .
  • the sequence archive data is stored in the table 290 which has the sequence_ID attribute in common with the Sequence Log table 300.
  • Fig. 6 illustrates a portion of the database for storing information regarding the sequencing equipment.
  • the Sequencer Maintenance Log table 900 collects information on maintenance of each DNA sequencing machine, which via the relational database can be related back to any DNA sequence.
  • the Sequencer Maintenance Log table 900 is linked with the Gel Key table 280 via the common attribute of instrument_number.
  • Table 900 includes such information as the date service was requested, the date service/maintenance was performed, the nature of the problem, staff involved in maintenance and pertinent comments.
  • the Catalyst and Computer Maintenance Logs tables (905 and 910 respectively) are linked through the computer_ID attribute and include similar information to that of the Sequencer Maintenance Log and can be related to essentially any DNA sequence.
  • the Equipment Log table 915 connects with Maintenance tables 900-910 via the instrument_number and computer_ID attributes and has information on the equipment or instruments used in the sequencing operation.
  • table 915 stores information regarding equipment name and serial number, vendor identifier, and date installed.
  • a separate vendor table 920 connects with the Equipment Log Table 915 via the vendor_identifier attribute, and stores, for example, the company name, address, phone number, fax number and contact person.
  • the vendor listing can also have additional information on the vendor, including E-mail address and date contract signed.
  • Fig. 7 illustrates a portion of the database of the preferred embodiment for storing information regarding the sequencing reagents.
  • the Gel Link table 925 links to the Gel Key table 280 via the gel_key_attribute and to the gel solution table 935 via the gel_solution_ID attribute.
  • the Gel Solution table 935 includes information on the gel solution and further includes the date the solution was made and who prepared the solution.
  • the Gel Solution-lot Link table 950 links to the gel solution table 935 via the gel_solution_ID attribute and also includes lot_number, and reagent_ID attributes which are shared with the Lot table 965.
  • the Reaction-Cocktail Link table 930 shares the reaction_set_ID attributes with the reaction set table 270.
  • the Reaction-Cocktail Link table 930 shares cocktail_ID with the Cocktail table 940.
  • the cocktail table 940 also has the date the cocktail was made and staff person who made the cocktail.
  • the Cocktail-Lot link table 955 has the cocktail ID attribute in common with the Cocktail table 940 and the Lot- number and Reagent_id in common with the Lot table 965.
  • the Lot table 965 includes reagent ID and lot number, vendor identifier, date received and date used.
  • the vendor_ID attribute is shared with the Vendor table 960.
  • a separate reagent table 970 shares the Reagent_ID attribute with the Lot table 965 and also has an expanded reagent name.
  • Experimental sets of sequences may be stored in the database in the express sets portion shown in Fig. 8.
  • This portion includes an express link table 370, a clone variant table 380, an experimental table 390, a clean up table 400 and a resequencing table 410.
  • Express Link table 370 stores sequence sets which have higher priority. They are given unique identifiers and handled separately from the batch process materials.
  • Clone Variant table 380 refers to variant sequences flagged by an individual investigator. The variants are evaluated by that scientist, collaborator, or customer and appropriate action is taken.
  • the experimental sequences stored in Experimental table 390 are similar to the variants above. They may be homologous, allelic or mutant sequences which have been flagged by a particular scientist.
  • Cleanup table 400 stores data reflecting the addition of extra steps to the protocol. The longer procedure is designed to improve readability of the sequence. Resequencing is simply repeating the procedure in order to check a sequence or to obtain more data. Information regarding resequencing is stored in Resequencing table 410.
  • Express Link table 370 contains a clone_ID attribute which is also included in the Clone Log table 250. Attribute log_entity_ID of the table 370 provides a correlation with variant_ID, experimental_set_ID, cleanUp_set_ID, and resequencing_set_ID of the tables 380, 390, 400, 410 respectively.
  • Log_table_name attribute of the table 370 identifies the table correlated by the Log_entity_ID.
  • each cDNA sequence that has been obtained in step 50 is then compared to the known sequences in the genetic databases to identify it if possible. This process involves comparing sequences (a) within a data set, (b) within the internal database and/or (c) with external databases. Since the library represents the frequency with which an RNA transcript appears within a certain source tissue, several different clones may contain all or parts of the same gene or its allele(s) . The computer also analyzes insert size by counting individual nucleotides in the sequence.
  • Data relating to sequence comparison is stored in tables on the sequence comparison portion of the database shown in Fig. 7. These tables include a first sequence match log table 510 and a second sequence match log table 515.
  • the database of the present invention may also access external databases. Genetic databases may have DNA or protein sequences. Such databases services may also provide searching or matching tools in addition to named DNAs, proteins or fragments thereof. As illustrated in Fig. 7, such outside databases include the GenBank database (box 610) , the ProDo database (box 570) , the Blocks database (box 580) , the Pisearch database (box 590) and the Sites database (box 600) .
  • Genbank database is used as a primary source of known genes, sequences and other information against which the sequencing stored in the database are compared.
  • Percent identity and probability are both considered to determine whether such fragments may be categorized as "exact” (apparently identical to a known/named human sequence) , or homologous (partially related) to a gene identified in humans or another species. Unique and unidentified fragments or sequences are listed by an identifier.
  • ProDom, Blocks, and Pisearch databases may be accessed in order to determine if a particular sequence contains functional protein domains or motifs. The patterns may provide important structural information for a peptide or protein encoded by the sequence.
  • Vectors database 520 stores the DNA sequences of the vectors used to clone the cDNAs. By comparing the identified cDNA sequences to the sequences in this database, vector sequences or stretches of vector sequences that show up in a cDNA sequence can be delimited.
  • Repeats database 530 allows repeats which belong to a multigene family, such as alu, to be identified.
  • Hidden Markov database 560 contains software which looks at a nucleotide sequence alignment and computes a predicted peptide structure from that sequence. As shown in Box 550 of Fig. 9, other databases which provide additional features can also be accessed.
  • Sequence Match Log tables 510 and 515 When a sequence comparison results in a match, the information regarding that match is stored in Sequence Match Log tables 510 and 515. This information generally includes address information for the matching sequence record in the external database as well as scores which represent the quality of the match. In an alternative embodiment it may be preferable to store the scores in a separate record, since the scoring methods are not identical for all databases. Sequence Match Log 510 is linked to sequence archive 290 by the attribute sequence_ID which they share. It should be noted that first Sequence_Match_Log 510 contains better matches, while marginal matches are stored in the second sequence_Match Log 515. Both tables (510 and 515) have identical attributes. Function identification, illustrated as step 70 in Fig. 2, is then performed on matches whose quality is above a specific threshold.
  • the data related to function identification is stored in the tables as shown in Fig. 10. These tables include a protein table 720, a protein-sequence link table 730, a folder table 760 and location table 780. Protein identification may come from any of the function/domain databases. The Genbank location or locus and the international EC number (enzyme or protein classification) are stored in table 720. Each entry in this table corresponds to one or more sequences from the sequence archive table which was conclusively identified with respect to its function. Protein table 720 is linked to Sequence Archive table 290 via Protein-Sequence Link table 730.
  • Protein table 720 has the attribute protein_ID in common with Protein-Sequence Link table 730; and Sequence Archive table 290 has the attribute sequence_ID in common with Protein-Sequence Link table 730.
  • Each entry in folder table 760 contains unstructured annotations for one or more sequences from the archive table which had interesting but inconclusive matches with the other databases. Any type of annotation, footnote, or remark can be recorded in the folder table 760. This permits the researcher to store desired information without contaminating other records in the database with information from inconclusive matches.
  • Folder table 760 is linked to sequence archive 290 via function sequence link 750.
  • Function sequence link 750 has an attribute Folder_ID in common with folder table 760 and an attribute Sequence_ID in common with sequence archive 290.
  • the present invention permits a researcher to search the relational database using keywords and to specify the table(s) in which the keyword search should be performed.
  • a researcher could query the database for all occurrences of the word "endothelial" in the Biological Source Table 130.
  • the present invention allows the researcher to store queries in Keywords table 790 shown in Fig. 10.
  • Each query stored in this table is identified by a unique Keyword_ID.
  • the computer pulls up the associated record, and searches the table(s) identified in the Table_name field for the keyword(s) stored in the Keyword_text field.
  • the results of the search can be delivered to the user for example via E-mail notification as shown in boxes 800-820 of Fig. 10.
  • Location table 780 stores information regarding the location within the cell of each identified sequence.
  • Location table 780 is linked to Protein table 720 by common attribute Protein ID, and stores the location information in an attribute called "Location.”
  • the domain for this attribute consists of these categories: nuclear, cytoplasmic (cytoskeleton) , cytoplasmic (intracellular membranes) , cytoplasmic (mitochondria) , cell surface, and secreted.
  • GDB links table 770 which links Protein table 720 to the Human Genome Database.
  • GDB links table 770 has attribute Protein_ID in common with Protein table 720 and links to the Human Genome Database via attribute GDB_ID.
  • the relational database of the preferred embodiment is well suited for performing abundance analysis.
  • This analysis provides a user with the relative frequency of mRNAs or transcripts found in a particular cell in a given state, e.g., normal or activated. For example, if a researcher were to input a query requesting the most abundant sequences in an LPS activated THP-1 cell, the computer system is programmed to search the relational database and output to the user a display such as, illustrated in Fig. 11.
  • the search is performed as follows. First, the cell culture/treatment records 140 in which the cell_line_name field equals "THP-1" (in this example) are identified. Next, the identified records are searched for records in which the treatment field equals "LPS.” Then, the sequence match log records 510 correlated in the database with this subset of identified records are determined and the number of sequence match log records for each distinct match ID value is counted to determine the abundance in the cell of the particular sequence identified by the match ID number. After the computer has examined all the biological source records, it sorts the obtained abundance information in the manner requested in the specific query and displays it as a chart, as exemplified in Fig. 11. Similarly, the database structure described above provides a convenient way to implement subtraction analysis.
  • Subtraction analysis determines which sequences are expressed more commonly in an activated cell compared to a normal cell.
  • abundance analysis is performed for the normal cell library and the activated cell library, and when the information is obtained, a ratio of the values is determined.
  • Fig. 12 exemplifies the output of such an operation for normal versus LPS activated THP-1.
  • Location analysis can also be performed.
  • the user requests, for example, the location of a specific protein within a particular activated macrophage.
  • the computer identifies the subset of records associated with the desired cell in the manner described above, consults the associated records in Protein table 720 to verify that the protein is present in the cell, and finally looks up the location of the protein in Location table 780 and outputs the location to the user.
  • sequence location table categories in the preferred embodiment are nuclear, cytoplasmic, cell surface or secreted. Within the cytoplasm, sequences may be assigned to cytoskeleton, intracellular membranes, or mitochondria. This information is provided in the location field of Location table 780. All of the unidentified sequences, regardless of their relative abundance, are by default relegated to the unknown category.
  • Yet another function supported by the database of the preferred embodiment is distribution. This function determines in which tissues or organs for example a given sequence is found and how frequently. The system steps through the records in the Sequencing Log 300 and when there is a match with the desired sequence the system determines the organ and tissue where the specified sequence was found through the relational association of the database. After all the sequences have been examined, an output is prepared representing the requested distribution statistics.
  • the detailed records and relational structure of the database allow the researcher to access practically any field reflecting a step in the mRNA, cDNA sequencing process.
  • the database of the present invention provides a powerful tool for analyzing test results as well as testing procedures.
  • this information can be obtained by stepping through the mRNA preparation records 150, finding the records with the desired lot number and outputting the related entries in the sequencing log.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

L'invention porte sur une base de données relationnelles (1) destinée à la mémorisation d'informations d'ordre microbiologique. La base de données contient des données de séquençage de l'ADN complémentaire (290, 300) et des répertoires correspondants (510, 515), indiquant une corrélation entre des séquences de l'ADNc en cours d'identification et des séquences déjà connues. On utilise, en outre, diverses tables constituant les bases de données pour mémoriser des données historiques relatives à l'identification d'une séquence particulière d'ADNc. Ces tables comportent des données sur des sources biologiques (130), des données sur la culture et le traitement de cellules (140), des données sur la production d'ARN messager (150), des données sur la construction de l'ADNc (170) ainsi que des données sur l'élaboration de clones. Celles-ci comportent, de surcroît, des tables contenant des données en rapport avec l'inoculation (200), l'élaboration (210), l'excision (190) et un fluorimètre (220, 230, 240). Les informations corrélées dans la base de données permettent d'utiliser différentes demandes pour l'extraction de données aux fins d'analyses scientifiques et pour d'autres applications. On peut, par exemple, procéder à une analyse d'abondance en faisant appel à la base de données selon le mode de réalisation préféré afin de déterminer à quelle fréquence spécifique apparaît un produit particulier de transcription de l'ARN dans une source tissulaire donnée.
PCT/US1995/012429 1995-01-27 1995-09-06 Memorisation et analyse informatiques de donnees microbiologiques WO1996023078A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
NZ294720A NZ294720A (en) 1995-01-27 1995-09-06 Computer system storing and analyzing microbiological data
AU37590/95A AU692626B2 (en) 1995-01-27 1995-09-06 Computer system storing and analyzing microbiological data
JP8522835A JPH11501741A (ja) 1995-01-27 1995-09-06 微生物学的データを保存し解析するコンピュータシステム
EP95935663A EP0805874A4 (fr) 1995-01-27 1995-09-06 Memorisation et analyse informatiques de donnees microbiologiques

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
USPCT/US95/01160 1995-01-27
PCT/US1995/001160 WO1995020681A1 (fr) 1994-01-27 1995-01-27 Analyse comparative de produits de transcription geniques

Publications (1)

Publication Number Publication Date
WO1996023078A1 true WO1996023078A1 (fr) 1996-08-01

Family

ID=22248580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1995/012429 WO1996023078A1 (fr) 1995-01-27 1995-09-06 Memorisation et analyse informatiques de donnees microbiologiques

Country Status (6)

Country Link
EP (1) EP0805874A4 (fr)
JP (1) JPH11501741A (fr)
AU (1) AU692626B2 (fr)
CA (1) CA2210731A1 (fr)
NZ (1) NZ294720A (fr)
WO (1) WO1996023078A1 (fr)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999005323A1 (fr) * 1997-07-25 1999-02-04 Affymetrix, Inc. Systeme d'expression et d'evaluation de genes
WO1999028836A2 (fr) * 1997-11-28 1999-06-10 Cybergene Ab Dispositif et procede d'analyse de sequences de nucleotides
US5966711A (en) * 1997-04-15 1999-10-12 Alpha Gene, Inc. Autonomous intelligent agents for the annotation of genomic databases
US5966712A (en) * 1996-12-12 1999-10-12 Incyte Pharmaceuticals, Inc. Database and system for storing, comparing and displaying genomic information
US5970500A (en) * 1996-12-12 1999-10-19 Incyte Pharmaceuticals, Inc. Database and system for determining, storing and displaying gene locus information
WO2000051053A1 (fr) * 1999-02-26 2000-08-31 Gemini Genomics (Uk) Limited Base de donnees clinique et diagnostique
WO2000070556A2 (fr) * 1999-05-19 2000-11-23 Whitehead Institute For Biomedical Research Procede et systeme de gestion de base de donnees relationnelle, permettant de memoriser, comparer et afficher les resultats produits par des analyses de donnees d'ensembles de genes
WO2001008036A2 (fr) * 1999-07-27 2001-02-01 Cellomics, Inc. Procede et systeme d'enregistrement dynamique et de validation de donnees de recherche
US6185561B1 (en) 1998-09-17 2001-02-06 Affymetrix, Inc. Method and apparatus for providing and expression data mining database
US6189013B1 (en) 1996-12-12 2001-02-13 Incyte Genomics, Inc. Project-based full length biomolecular sequence database
EP1075673A1 (fr) * 1998-05-08 2001-02-14 Rosetta Inpharmatics Inc. Procedes pour determiner des niveaux d'activite d'une proteine au moyen de profils d'expression genetique
WO2001026029A2 (fr) * 1999-10-01 2001-04-12 Orchid Biosciences, Inc. Procede et systeme permettant de fournir sur un reseau informatique des renseignements cliniques sur le genotype
WO2001059683A2 (fr) * 2000-02-11 2001-08-16 Pangene Corporation Services genomiques integres
WO2002012434A2 (fr) * 2000-08-10 2002-02-14 Glaxo Group Limited Reseau servant a controler un profil de reponse medicale electronique globale
WO2002027035A2 (fr) * 2000-09-28 2002-04-04 Pangene Corporation Clonage de genes et criblage phenotypique a haut rendement
US6420108B2 (en) 1998-02-09 2002-07-16 Affymetrix, Inc. Computer-aided display for comparative gene expression
WO2002065334A1 (fr) * 2001-02-14 2002-08-22 Tibotec Bvba Procede et appareil offrant une gestion d'information automatisee dans le cadre du criblage a haut debit
US6489096B1 (en) 1998-10-15 2002-12-03 Princeton University Quantitative analysis of hybridization patterns and intensities in oligonucleotide arrays
WO2003005236A2 (fr) * 2001-07-05 2003-01-16 Lion Bioscience Ag Procede et dispositif pour la creation, le maintien et l'utilisation d'une base de donnees de reference
US6553317B1 (en) 1997-03-05 2003-04-22 Incyte Pharmaceuticals, Inc. Relational database and system for storing information relating to biomolecular sequences and reagents
EP1328880A1 (fr) * 2000-10-12 2003-07-23 Iconix Pharmaceuticals, Inc. Correlation interactive de donnees de compose et de donnees genomiques
US6606622B1 (en) * 1998-07-13 2003-08-12 James M. Sorace Software method for the conversion, storage and querying of the data of cellular biological assays on the basis of experimental design
US6611828B1 (en) 1997-05-15 2003-08-26 Incyte Genomics, Inc. Graphical viewer for biomolecular sequence data
WO2003085083A2 (fr) * 2002-04-01 2003-10-16 Phase-1 Molecular Toxicology, Inc. Genes predicteurs de necrose du foie
US6643634B2 (en) 1997-05-15 2003-11-04 Incyte Genomics, Inc. Graphical viewer for biomolecular sequence data
US6826296B2 (en) 1997-07-25 2004-11-30 Affymetrix, Inc. Method and system for providing a probe array chip design database
EP1244047A3 (fr) * 2001-03-20 2005-06-01 Ortho Clinical Diagnostics Inc. Procédé permettant de fournir des services de diagnostic clinique
US6941317B1 (en) 1999-09-14 2005-09-06 Eragen Biosciences, Inc. Graphical user interface for display and analysis of biological sequence data
EP1396800A3 (fr) * 1997-07-25 2006-05-03 Affymetrix, Inc. Procédé et appareil d'obtention d'une base de données bioinformatiques
US7068830B2 (en) 1997-07-25 2006-06-27 Affymetrix, Inc. Method and system for providing a probe array chip design database
US7467118B2 (en) 2006-01-12 2008-12-16 Entelos Inc. Adjusted sparse linear programming method for classifying multi-dimensional biological data
US7588892B2 (en) 2004-07-19 2009-09-15 Entelos, Inc. Reagent sets and gene signatures for renal tubule injury
US7844469B2 (en) 1999-10-22 2010-11-30 Cerner Innovation, Inc. Genetic profiling and banking system and method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6507788B1 (en) * 1999-02-25 2003-01-14 Société de Conseils de Recherches et D'Applications Scientifiques (S.C.R.A.S.) Rational selection of putative peptides from identified nucleotide, or peptide sequences, of unknown function
US6611833B1 (en) * 1999-06-23 2003-08-26 Tissueinformatics, Inc. Methods for profiling and classifying tissue using a database that includes indices representative of a tissue population

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5364759A (en) * 1991-01-31 1994-11-15 Baylor College Of Medicine DNA typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats
US5371671A (en) * 1990-03-13 1994-12-06 The Regents Of The University Of California DNA sequence autoradiogram digitizer and methodology implemented in the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU660690B2 (en) * 1991-08-27 1995-07-06 Orchid Biosciences Europe Limited Method of characterisation
EP0582755A1 (fr) * 1992-08-02 1994-02-16 SOFT GENE, ENTWICKLGS. U. VERTRIEBSGES. F. MOLEKULARBIOLOGISCHE SOFTWARE mbH Procédé pour stocker des données dans des bases de données

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5371671A (en) * 1990-03-13 1994-12-06 The Regents Of The University Of California DNA sequence autoradiogram digitizer and methodology implemented in the same
US5364759A (en) * 1991-01-31 1994-11-15 Baylor College Of Medicine DNA typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats
US5364759B1 (en) * 1991-01-31 1997-11-18 Baylor College Medicine Dna typing with short tandem repeat polymorphisms and indentification of polymorphic short tandem repeats
US5364759B2 (en) * 1991-01-31 1999-07-20 Baylor College Medicine Dna typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
COMMUNICATIONS OF THE ACM, Volume 34, No. 11, issued November 1991, K.A. FRENKEL, "The Human Genome Project and Informatics: a Monumental Scientific Adventure", pages 40-52. *
CURRENT BIOLOGY LTD., 1993, K. MATSUBARA et al., "Identification of New Genes by Systematic Analysis of cDNAs and Database Construction", pages 672-677. *
MATHEMATICAL METHODS FOR DNA SEQUENCES, Editor M.S. WATERMAN, Copyright 1989, J.W. FRICKETT et al., "Development of a Database for Nucleotide Sequences", pages 2-34. *
NATURE GENETICS, Volume 2, issued November 1992, A.S. KHAN et al., "Single Pass Sequencing and Physical and Genetic Mapping of Human Brain cDNAs", pages 180-185. *
NUCLEIC ACIDS RESEARCH, Volume 19, No. 25, issued 1991, E. HARA et al., "Subtractive cDNA Cloning Using Oligo(dT)30-Latex and PCR: Isolation of cDNA Clones Specific to Undifferentiated Human Embryonal Carcinoma Cells", pages 7097-7104. *
SCIENCE, Volume 252, issued 21 June 1991, M.D. ADEMS et al., "Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project", pages 1651-1656. *
See also references of EP0805874A4 *

Cited By (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363399B1 (en) 1996-10-10 2002-03-26 Incyte Genomics, Inc. Project-based full-length biomolecular sequence database with expression categories
US6189013B1 (en) 1996-12-12 2001-02-13 Incyte Genomics, Inc. Project-based full length biomolecular sequence database
US5966712A (en) * 1996-12-12 1999-10-12 Incyte Pharmaceuticals, Inc. Database and system for storing, comparing and displaying genomic information
US5970500A (en) * 1996-12-12 1999-10-19 Incyte Pharmaceuticals, Inc. Database and system for determining, storing and displaying gene locus information
US6742004B2 (en) 1996-12-12 2004-05-25 Incyte Genomics, Inc. Database and system for storing, comparing and displaying genomic information
US6553317B1 (en) 1997-03-05 2003-04-22 Incyte Pharmaceuticals, Inc. Relational database and system for storing information relating to biomolecular sequences and reagents
US5966711A (en) * 1997-04-15 1999-10-12 Alpha Gene, Inc. Autonomous intelligent agents for the annotation of genomic databases
US6643634B2 (en) 1997-05-15 2003-11-04 Incyte Genomics, Inc. Graphical viewer for biomolecular sequence data
US6611828B1 (en) 1997-05-15 2003-08-26 Incyte Genomics, Inc. Graphical viewer for biomolecular sequence data
US6567540B2 (en) 1997-07-25 2003-05-20 Affymetrix, Inc. Method and apparatus for providing a bioinformatics database
US6532462B2 (en) 1997-07-25 2003-03-11 Affymetrix, Inc. Gene expression and evaluation system using a filter table with a gene expression database
US6826296B2 (en) 1997-07-25 2004-11-30 Affymetrix, Inc. Method and system for providing a probe array chip design database
US6882742B2 (en) 1997-07-25 2005-04-19 Affymetrix, Inc. Method and apparatus for providing a bioinformatics database
WO1999005323A1 (fr) * 1997-07-25 1999-02-04 Affymetrix, Inc. Systeme d'expression et d'evaluation de genes
US6229911B1 (en) 1997-07-25 2001-05-08 Affymetrix, Inc. Method and apparatus for providing a bioinformatics database
US7215804B2 (en) 1997-07-25 2007-05-08 Affymetrix, Inc. Method and apparatus for providing a bioinformatics database
EP1396800A3 (fr) * 1997-07-25 2006-05-03 Affymetrix, Inc. Procédé et appareil d'obtention d'une base de données bioinformatiques
US6308170B1 (en) 1997-07-25 2001-10-23 Affymetrix Inc. Gene expression and evaluation system
WO1999005324A1 (fr) * 1997-07-25 1999-02-04 Affymetrix, Inc. SYSTEME D'OBTENTION D'UNE BASE DE DONNEES DE POLYMORPHISMES$i()
US7068830B2 (en) 1997-07-25 2006-06-27 Affymetrix, Inc. Method and system for providing a probe array chip design database
WO1999028836A2 (fr) * 1997-11-28 1999-06-10 Cybergene Ab Dispositif et procede d'analyse de sequences de nucleotides
WO1999028836A3 (fr) * 1997-11-28 1999-07-15 Cybergene Ab Dispositif et procede d'analyse de sequences de nucleotides
US6420108B2 (en) 1998-02-09 2002-07-16 Affymetrix, Inc. Computer-aided display for comparative gene expression
US7130746B2 (en) 1998-05-08 2006-10-31 Rosetta Inpharmatics Llc Computer systems and computer programs for determining protein activity levels using gene expression profiles
EP1075673A4 (fr) * 1998-05-08 2006-03-22 Rosetta Inpharmatics Inc Procedes pour determiner des niveaux d'activite d'une proteine au moyen de profils d'expression genetique
EP1075673A1 (fr) * 1998-05-08 2001-02-14 Rosetta Inpharmatics Inc. Procedes pour determiner des niveaux d'activite d'une proteine au moyen de profils d'expression genetique
US6606622B1 (en) * 1998-07-13 2003-08-12 James M. Sorace Software method for the conversion, storage and querying of the data of cellular biological assays on the basis of experimental design
US6185561B1 (en) 1998-09-17 2001-02-06 Affymetrix, Inc. Method and apparatus for providing and expression data mining database
US6687692B1 (en) 1998-09-17 2004-02-03 Affymetrix, Inc. Method and apparatus for providing an expression data mining database
US6489096B1 (en) 1998-10-15 2002-12-03 Princeton University Quantitative analysis of hybridization patterns and intensities in oligonucleotide arrays
WO2000051053A1 (fr) * 1999-02-26 2000-08-31 Gemini Genomics (Uk) Limited Base de donnees clinique et diagnostique
WO2000070556A3 (fr) * 1999-05-19 2001-08-16 Whitehead Biomedical Inst Procede et systeme de gestion de base de donnees relationnelle, permettant de memoriser, comparer et afficher les resultats produits par des analyses de donnees d'ensembles de genes
WO2000070556A2 (fr) * 1999-05-19 2000-11-23 Whitehead Institute For Biomedical Research Procede et systeme de gestion de base de donnees relationnelle, permettant de memoriser, comparer et afficher les resultats produits par des analyses de donnees d'ensembles de genes
WO2001008036A2 (fr) * 1999-07-27 2001-02-01 Cellomics, Inc. Procede et systeme d'enregistrement dynamique et de validation de donnees de recherche
WO2001008036A3 (fr) * 1999-07-27 2004-02-19 Cellomics Inc Procede et systeme d'enregistrement dynamique et de validation de donnees de recherche
US6941317B1 (en) 1999-09-14 2005-09-06 Eragen Biosciences, Inc. Graphical user interface for display and analysis of biological sequence data
WO2001026029A2 (fr) * 1999-10-01 2001-04-12 Orchid Biosciences, Inc. Procede et systeme permettant de fournir sur un reseau informatique des renseignements cliniques sur le genotype
WO2001026029A3 (fr) * 1999-10-01 2002-03-07 Orchid Biosciences Inc Procede et systeme permettant de fournir sur un reseau informatique des renseignements cliniques sur le genotype
US8239212B2 (en) 1999-10-22 2012-08-07 Cerner Innovation, Inc. Genetic profiling and banking system and method
US7844469B2 (en) 1999-10-22 2010-11-30 Cerner Innovation, Inc. Genetic profiling and banking system and method
WO2001059683A3 (fr) * 2000-02-11 2002-05-10 Pangene Corporation Services genomiques integres
WO2001059683A2 (fr) * 2000-02-11 2001-08-16 Pangene Corporation Services genomiques integres
WO2002012434A2 (fr) * 2000-08-10 2002-02-14 Glaxo Group Limited Reseau servant a controler un profil de reponse medicale electronique globale
WO2002012434A3 (fr) * 2000-08-10 2003-08-14 Glaxo Group Ltd Reseau servant a controler un profil de reponse medicale electronique globale
WO2002027035A3 (fr) * 2000-09-28 2003-08-28 Pangene Corporation Clonage de genes et criblage phenotypique a haut rendement
WO2002027035A2 (fr) * 2000-09-28 2002-04-04 Pangene Corporation Clonage de genes et criblage phenotypique a haut rendement
US7054755B2 (en) 2000-10-12 2006-05-30 Iconix Pharmaceuticals, Inc. Interactive correlation of compound information and genomic information
EP1328880A1 (fr) * 2000-10-12 2003-07-23 Iconix Pharmaceuticals, Inc. Correlation interactive de donnees de compose et de donnees genomiques
EP1328880A4 (fr) * 2000-10-12 2004-12-15 Iconix Pharm Inc Correlation interactive de donnees de compose et de donnees genomiques
WO2002065334A1 (fr) * 2001-02-14 2002-08-22 Tibotec Bvba Procede et appareil offrant une gestion d'information automatisee dans le cadre du criblage a haut debit
AU784645B2 (en) * 2001-03-20 2006-05-18 Ortho-Clinical Diagnostics, Inc. Method for providing clinical diagnostic services
EP1244047A3 (fr) * 2001-03-20 2005-06-01 Ortho Clinical Diagnostics Inc. Procédé permettant de fournir des services de diagnostic clinique
WO2003005236A3 (fr) * 2001-07-05 2004-03-25 Lion Bioscience Ag Procede et dispositif pour la creation, le maintien et l'utilisation d'une base de donnees de reference
WO2003005236A2 (fr) * 2001-07-05 2003-01-16 Lion Bioscience Ag Procede et dispositif pour la creation, le maintien et l'utilisation d'une base de donnees de reference
WO2003085083A3 (fr) * 2002-04-01 2004-07-22 Phase 1 Molecular Toxicology I Genes predicteurs de necrose du foie
WO2003085083A2 (fr) * 2002-04-01 2003-10-16 Phase-1 Molecular Toxicology, Inc. Genes predicteurs de necrose du foie
US7588892B2 (en) 2004-07-19 2009-09-15 Entelos, Inc. Reagent sets and gene signatures for renal tubule injury
US7467118B2 (en) 2006-01-12 2008-12-16 Entelos Inc. Adjusted sparse linear programming method for classifying multi-dimensional biological data

Also Published As

Publication number Publication date
AU692626B2 (en) 1998-06-11
CA2210731A1 (fr) 1996-08-01
JPH11501741A (ja) 1999-02-09
EP0805874A4 (fr) 1998-05-20
AU3759095A (en) 1996-08-14
EP0805874A1 (fr) 1997-11-12
NZ294720A (en) 1998-06-26

Similar Documents

Publication Publication Date Title
AU692626B2 (en) Computer system storing and analyzing microbiological data
US6553317B1 (en) Relational database and system for storing information relating to biomolecular sequences and reagents
US5953727A (en) Project-based full-length biomolecular sequence database
US6189013B1 (en) Project-based full length biomolecular sequence database
US6303297B1 (en) Database for storage and analysis of full-length sequences
US6223186B1 (en) System and method for a precompiled database for biomolecular sequence information
US6742004B2 (en) Database and system for storing, comparing and displaying genomic information
US6023659A (en) Database system employing protein function hierarchies for viewing biomolecular sequence data
Worley et al. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.
US6484183B1 (en) Method and system for providing a polymorphism database
US7065451B2 (en) Computer-based method for creating collections of sequences from a dataset of sequence identifiers corresponding to natural complex biopolymer sequences and linked to corresponding annotations
JPH09503921A (ja) 遺伝子転写産物の比較解析
White et al. [2] TDB: New databases for biological discovery
JP2002544632A (ja) 遺伝子アレイの分析により生成した結果を保存し、比較し、そして表示するための方法および関連データベース関係型システム
WO2002067181A1 (fr) Procedes permettant d"etablir une base de donnees de voies et d"effectuer des recherches de voies
US20030200033A1 (en) High-throughput alignment methods for extension and discovery
US5618672A (en) Method for analyzing partial gene sequences
JP2003157267A (ja) 核酸塩基配列のアセンブル方法及びアセンブル装置
US20040101903A1 (en) Method and apparatus for sequence annotation
Saviozzi et al. Microarray data analysis and mining
KR100513266B1 (ko) 클라이언트/서버 기반 est 서열 분석 시스템 및 방법
JP2005250615A (ja) 遺伝子解析支援システム
Bell et al. Content-based search of gene expression databases using binary fingerprints of differential expression profiles
CN115050421A (zh) 一种肿瘤新生抗原及靶向药信息的存储方法
Zhou Constructing Regulatory Networks from Gene Expression Data Using Association Rule Mining

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AU BB BG BR BY CA CN CZ EE FI GE HU IS JP KG KP KR KZ LK LR LT LV MD MG MN NO NZ PL RO RU SG SI SK TJ TM TT UA US UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD SZ UG AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 294720

Country of ref document: NZ

ENP Entry into the national phase

Ref document number: 2210731

Country of ref document: CA

Ref country code: CA

Ref document number: 2210731

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1995935663

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1996 522835

Kind code of ref document: A

Format of ref document f/p: F

WWP Wipo information: published in national office

Ref document number: 1995935663

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1995935663

Country of ref document: EP