WO2002039486A9 - Integrated system for biological information - Google Patents
Integrated system for biological informationInfo
- Publication number
- WO2002039486A9 WO2002039486A9 PCT/US2001/049984 US0149984W WO0239486A9 WO 2002039486 A9 WO2002039486 A9 WO 2002039486A9 US 0149984 W US0149984 W US 0149984W WO 0239486 A9 WO0239486 A9 WO 0239486A9
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- services
- client
- components
- interface
- component
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
Definitions
- Figure 1 illustrates a typical sequence of steps a scientist might take to accomplish a relatively simple analysis task.
- the scientist To examine a DNA sequence in alignment with similar sequences from a public database, the scientist must use three different tools with three different interfaces and convert the output from each one to a format acceptable as input to the next. For more complex analysis, many other resources might be required. It has been reported that more than half of a scientist's time may be spent on tasks related to the integration of data from incompatible databases and software programs. Moreover, this approach of converting the output of one program into acceptable input for another greatly limits the integrative potential between components that are highly interactive in nature.
- a second approach that emphasizes data access over analysis and visualization, is to enable complex declarative queries that span multiple heterogeneous databases.
- This approach has received considerable attention in bioinformatics, and has given rise to several systems.
- U.S. Patent 5,859,972 to Subramaniam et al. discloses such a method.
- a translation algorithm By use of a translation algorithm, a user query is simultaneously parsed to multiple databases. During the translation, the query is formatted for the destination database.
- the method has inherent weaknesses as it either requires problematic on-the-fly mappings from representations in source databases to a definitive "ontology" (roughly, a global schema), or forces users to express queries in the sundry schemas of source databases.
- a third approach is to package heterogeneous software tools and databases as components adhering to standard, well-defined interfaces, according to which information can be exchanged.
- This approach encourages components to encapsulate their differences and expose only minimal, abstract attributes and behaviors.
- Related computer programming models supporting this approach include Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM).
- CORBA is being promoted by Object Management Group (OMG), a consortium of over 700 companies.
- OMB publishes a set of standard interfaces using its Interface Definition Language (IDL) and implementation of components using CORBA.
- IDL Interface Definition Language
- the component- based approach has a number of advantages for the problem of integration for biological information.
- the main stumbling block to this third approach is the rigidity imposed by standardization of interfaces.
- the OMG defines its standards by committee, in a way that is fair but also slow and arduous. It defines interfaces in relatively specific terms, which encourages exchange of data without loss of information, but makes standards harder to agree upon and more constraining for implementers.
- the specification of standard interfaces, useful as it is does not address how components are to be integrated. System builders are free to integrate components as they see fit, and often write top-level controllers that instantiate and call components directly. In this way, they fail to take full advantage of the great potential for flexibility offered by component-based design, in contrast to the present invention.
- One embodiment of the present invention comprises a computerized system for integrating object based data models comprising a software platform, one or more interface-based data models, and one or more component services.
- the software platform comprises a Client Environment and a Client Bus.
- the software platform comprises software instructions in an object oriented programming language.
- the Client Environment comprises a common user interface.
- the Client Bus brokers all communication between software components and further comprises an Event Channel function and a Services Broker function.
- the Event Channel enables components to register as listeners for particular types of events and react to them in prescribed ways.
- the Services Broker allows components to request and provide services to one another by requesting a service by name, requesting services by types, or requesting services by attributes for which services are requested. Components, to the amount possible, interoperate without having a direct knowledge of one another.
- the data models comprise at least one data model useful for bioinformatics research.
- Another embodiment of the present invention comprises a computerized system for integrating object based data models comprising a software platform, one or more interface-based data models, and one or more component services.
- the software platform comprises a Client Environment and a Client Bus.
- the software platform comprises software instructions in an object oriented programming language.
- the Client Environment comprises a common user interface.
- the Client Bus brokers all communication between software components and further comprises an Event Channel function and a Services Broker function.
- the Event Channel enables components to register as listeners for particular types of events and react to them in prescribed ways.
- the Services Broker allows components to request and provide services to one another by requesting a service by name, requesting services by types, or requesting services by attributes for which services are requested. Components, to the amount possible, interoperate without having a direct knowledge of one another.
- the data models comprise comprise at least one high-level map for an Integrated Bioinformation system.
- a further embodiment of the present invention comprises a computerized system for integrating object based data models comprising a software platform, one or more interface-based data models, and one or more component services.
- the software platform comprises a Client Environment and a Client Bus.
- the software platform comprises software instructions in an object oriented programming language.
- the Client Environment comprises a common user interface.
- the Client Bus brokers all communication between software components and further comprises an Event Channel function and a Services Broker function.
- the Event Channel enables components to register as listeners for particular types of events and react to them in prescribed ways.
- the Services Broker allows components to request and provide services to one another by requesting a service by name, requesting services by types, or requesting services by attributes for which services are requested. Components, to the amount possible, interoperate without having a direct knowledge of one another.
- the data models comprise at least one high-level map for an Integrated Plant Bioinformation system.
- Figure 1 illustrates a schematic view of the typical steps a scientist might take for a simple bioinformatic research task.
- Figure 2 illustrates a schematic view of a simple embodiment of the system architecture of the invention.
- Figure 3 illustrates a schematic view of a simplified version of example interfaces of the invention.
- Figure 4 illustrates a screen view at a client user in accordance with one embodiment of the invention.
- Figure 5 illustrates a screen view at a client user in accordance with one embodiment of the invention wherein a pop-up window is displayed following dynamic discovery.
- Figure 6 illustrates a screen view at a client user in accordance with one embodiment of the invention wherein windows display synchronization aspects of the invention.
- Figure 7 schematically illustrates a high-level data map for a Map subsystem.
- Figure 8 schematically illustrates a high-level data map for a Sequence subsystem.
- Figure 9 schematically illustrates a high-level data map for a Metabolic Pathway subsystem.
- Figure 10 schematically illustrates a high-level data map for a Gene Expression subsystem.
- Figure 11 schematically illustrates a high-level data map for an Integrated Bioinformation system.
- BLAST ⁇ Basic Local Alignment Search Tool is a set of similarity search programs designed to explore all of the available sequence databases maintained by the National Center for Biotechnology Information (NCBI), regardless of whether the query is protein or DNA.
- NCBI National Center for Biotechnology Information
- CGI Common Gateway Interface - Set of rules that describe how an Internet server communicates with another application running on the same computer and how the application (called a CGI program) communicates with the Internet server.
- Any application can be a CGI program if it handles input and output according to the CGI standard.
- CORBA Common Object Request Broker Architecture— an architecture neutral, object oriented client-server solution. With CORBA you can abstract an object by its services and publish these using the IDL (Interface Definition Language). A client can then connect to and use these services.
- IDL Interface Definition Language
- Component - a reusable program building block that can be combined with other components in the same or other computers in a distributed network to form an application.
- Examples of a component include: a single button in a graphical user interface, a small interest calculator, an interface to a database manager, or a statistical analysis tool.
- Gene - unit of inheritance a piece of the genetic material that determines the inheritance of a particular characteristic, or group of characteristics. Genes are carried by chromosomes in the cell nucleus and are arranged in a line along each chromosome. Genome - the complete collection of an organism's genetic material. The human genome is composed of an estimated 50,000 to 150,000 genes located on the 23 pairs of chromosomes in a human cell.
- Instantiate - to create a particular realization of an abstraction for example, defining a particular variation of object within a class, giving it a name, and locating it in some physical place.
- Java an object oriented programming language designed for use in the Internet. Java can be used to create applications that may run on a single computer or be distributed among servers and clients in a network.
- Java servlet - a small program that runs on a server, but invoked by a client request.
- the server must be running Java.
- Metabolic proteins - enzymes and transport proteins including physico-chemical properties of these peptides, kinetic modifiers and external links to sequence and structural databases.
- Pathways the sequences of steps that compose each pathway, including their direction, catalyst (if any - uncatalyzed steps are also supported), modifiers, and location information (at organism or intracellular level).
- Taxonomy (or, more properly, Systematics) - the classifying of organisms into a series of hierarchical groups.
- the criteria that different taxonomic systems use for defining these groups may be very different (morphological, molecular, etc.), but in general these groups tend to be reflective of common evolutionary descent. Taxonomic studies may seek to understand the known similarities and differences between species based on their evolutionary history, or they may attempt to use known attributes from a highly studied species to postulate unknown characteristics.
- the present invention comprises an integrated, object-based, computing software platform, hereinafter referred to as the "integrated system” or "IS,” that enables genomic researchers and bioinformaticians to access and utilize disparate bioinformatic software tools and data sources in a seamless, unified environment.
- the disparate software tools and data comprise “components” that are integrated by the invention.
- a “component” is a coherent package of software that can be independently developed and delivered as a unit, and that defines interfaces by which it can be composed with other components to provide and use services.
- a gene expression component may include a relational database engine, a suite of analyzing tools, and an upload engine for storing, analyzing and downloading gene expression data.
- CORBA Common Object Request Broker Architecture
- the present invention includes a novel software platform integrating bioinformation data models.
- One embodiment of the present invention's software platform is partially illustrated in figure 2.
- figure 2 schematically illustrates a portion of the system architecture. It shows several components on the client side, integrated by way of the "IS Platform". Additional aspects of the invention include the platform, the exchange of services, the exchange of events, how non-platform components fit into the system, and the data model that unifies the system.
- the "IS Platform” comprises two components that are responsible for the integration and management of all others: the ClientBus and the IS Client Environment (ICE).
- the ICE is the simple user interface that allows a user to invoke components directly.
- the ClientBus is responsible for all communication between components. It further comprises a combination of an Event Channel (sometimes called a bus) and a Broker.
- Event Channel sometimes called a bus
- the ClientBus enables components to register as "listeners" for particular types of events, or occurrences, and react to them in prescribed ways.
- As a Broker it allows components to request and provide services to one another, as defined by service name, or by types and attributes of objects for which services are requested. In both cases, components interoperate without having direct knowledge of one another, to promote flexibility.
- the interface to the ClientBus provides: (1) registry of a component as a listener for a particular class of events; (2) broadcast of an event, generally linked to a specific underlying data object; (3) registry of a component as a provider of particular services; (4) request of services by name; and (5) request of services applicable for a particular object.
- the first two relate to events, and the final three to services.
- Event exchange is a mechanism to allow synchronized behavior among components.
- the Event Channel of the invention is a variant of the standard Observer pattern that allows components to exchange events without registering directly with one another. Instead they remain decoupled and interact only with the Event Channel, which behaves as a mediator among components.
- any component may register with the bus as a listener for any type of event.
- Event objects generally are given references to associated data objects. Once a component is registered, all events of the specified type will result in calls to the component's listener method, as long as the two components are defined as being synchronized.
- the invention additionally allows deactivation of synchronization where it is unnecessary or confusing.
- FIG 2 when a user selects a matching sequence in the SimilaritySearcher, that component generates an Ite SelectionEvent and "fires" it by calling a method on the ClientBus. This event holds a reference to a data object representing the selected homo log.
- SequenceViewer is synchronized with the SimilaritySearcher (for example, if synchronization requested by the SequenceViewer Factory when it originally spawned the SimilaritySearcher) then the ClientBus will call the Sequence Viewer's registered listener for ItemSelectionEvents. Finally, this listener, within the SequenceViewer, inspects the data object for an identifier, locates the corresponding object among the features it is displaying, and highlights it as selected.
- the other aspect of the ClientBus is the Broker.
- the Broker of the ClientBus allows decoupled components to interact. It comprises a "pull" mechanism in that the receiving component is provided services when it requests them.
- a Broker follows the CORBA standards and generally is applied to systems having distributed components.
- the Broker of the present invention differs from a traditional CORBA broker in three main ways.
- services are objects, passed via the broker from service providers to service consumers. They encapsulate their behavior using the Command pattern. This is a relatively minor change that allows services to be passed among components, and that causes specific services (analogous to methods), rather than servers with specific interfaces (collections of related methods), to be the unit of exchange.
- the Broker runs on the client, and allows direct interoperation only among client-side components. As a result, client-side and server-side proxies handling tasks associated with network communication, are not necessary.
- the ClientBus supports Dynamic Discovery, by which registered service providers evaluate a particular data object and respond whether or not their services are suitable for it, and suitable services are forwarded to a requester.
- the SequenceViewer Client has identified a data object associated with the selected graphical object.
- a service request is transmitted to the ClientBus.
- the ClientBus has forwarded the request to all registered service providers.
- the service providers include the SequenceServer Proxy and the SequenceAnalysis Proxy. These service providers inspect the object, and if it finds its type recognizable and sufficient data present, creates appropriate Service objects and passes them to the ClientBus.
- the ClientBus returns these service objects to the SequenceViewer Client.
- the SequenceViewer Client displays their descriptions in a pop-up menu. The user can then select one of these descriptions, which when selected, invokes the corresponding service, passing it the original data object.
- Non-platform components directly belonging to the system i.e., components that are neither part of the platform nor external servers
- Client and ServiceProvider Implementers of Client may fire and receive events, and implementers of ServiceProvider may register and provide services. Any component may request a service. Also, a single component may implement both interfaces (although this is not usually done).
- ServiceProviders usually act as server proxies or client factories.
- Server proxies are gateways to servers. They provide a layer of insulation between the system and external resources. This is valuable, in two ways: if the interface to an external resources changes, alterations to the system may be confined to the server proxy; and server proxies that fulfill the same abstract responsibilities (e.g., a server proxy that executes BLAST searches at the National Center for Genome Research, and one that executes BLAST searches at National Center for Biotechnology Institute) can be interchanged without perturbing the rest of the system.
- Client factories service providers
- Servers are necessary for the system but are external to it. Any resource that can be called by a server proxy, generally as a provider of data, can function as a server. Such a resource may exist on the Internet, on a local network, or on the same machine as the client. (In general, the problem of scalability can be addressed this way: processes involving large quantities of data or intensive communication are handled on appropriate servers, which are integrated in the client-side environment via server proxies). Communication with remote servers can occur using any t imon protocol including HTTP, CORBA, and Java RMITM.
- An accompanying client factory also small and simple, must be created to support the client.
- Data providers either in the form of databases or analysis tools, are generally added as servers, and accessed via server proxies. Since server proxies have complete freedom in the retrieval of information, they may fulfill their responsibilities without using a true server (in the conventional sense of an interacting local or remote process).
- a server proxy may invoke methods of a utility library, or spawn a child process.
- the software platform of the invention adds powerful features of flexibility and Dynamic Discovery to traditional approaches for integrating heterogeneous data components and services.
- Preferred embodiments of the invention apply the integrated software platform to data models of particular usefulness for bioinformatic research.
- the software platform is applied to data models that take advantage of the fact that there exist a relatively small number of fundamentally important classes of biological objects that tend to resist major schematic variation: e.g., DNA sequences, genes, proteins, enzymes, pathways, maps, mappable elements, taxonomic designations.
- these classes form a common denominator for biological databases, with differences among schemas tending to occur in less essential details (e.g., how feature locations are represented, what kinetic parameters are represented for enzymes).
- GSDB Gene Sequence Database maintained by National Center for Genome Resources
- PathDB Metal Pathways Database maintained by the National Center for Genome Resources
- GSDB and TAIR The Arabidopsis Information Resource maintained by the National Center for Genome Resources and Carnegie Institution of Washington
- the preferred embodiment of the invention exposes only the fundamental classes in component interfaces, and encapsulates manipulation of the less important details within the components. The cost of this is small, compared to the advantages it offers in system flexibility, because the details tend not be important across component boundaries.
- a search for enzymes having particular kinetic properties can be encapsulated within a pathway-searching component.
- a pathway-searching component When looking across components to find genes in GSDB associated with the particular enzymes, only the relationship between the fundamental classes for enzymes and genes is used.
- the kinetic parameters are generally important only indirectly for cross-database uses, so they are not exposed in the pathway component's interface, to avoid unnecessary interdependencies with other components. Java interfaces have been found to support implementation of this approach.
- An interface can be defined for each of the fundamental classes, representing its attributes (via accessor methods) and essential behaviors. Interfaces can inherit multiply from one another, classes can implement multiple interfaces, and interfaces can include methods that declare associations with other interfaces. Consequently, a single object can simultaneously represent multiple fundamental classes.
- Figure 3 illustrates an example prepared using Java.
- This novel approach differs from the standard Model-View-Controller (MVC) approach described in A System of Patterns: Pattern-Oriented Software Architecture, written by Buschmann, et al.
- MVC Model-View-Controller
- components exchange information about private models and views in their Java interfaces (components can still use the MVC pattern internally). Since they can implement these interfaces however they choose, private models can be kept at a high-level or made arbitrarily rich.
- the TaxonomyBrowser of figure 3 has a much richer implementation of the interface IsysTaxon than does the SimilaritySearcher; but because certain objects in both components are understood to be of type IsysTaxon, they can be exchanged readily.
- the TaxonomyBrowser can change its implementation however it wants without disrupting the SimilaritySearcher, as long as the interface is kept the same.
- the resulting data models of the invention do not represent a competing standard for representing bioinformatics data, but are compatible with developing standards (Life Sciences Research Task Force, 1997). The reason is that the data models of the invention are able to be more abstract and less comprehensive than other models and can be easily mapped to subsets of them. Components may employ more detailed data maps for intra- component use, while simultaneously communicating with other components via higher-level data models. They may even choose to interact with one another directly at lower levels at the cost of decreased insulation from changes to one another.
- Figures 4, 5, and 6 illustrate the user interface displays that result when a user of the present invention is performing the same analysis steps executed in figure 1.
- Figures 4, 5, and 6 depict two components, SequenceViewer 100 and SimilaritySearcher 104.
- the SequenceViewer 100 displays colored bars 102 representing a DNA sequence and its annotations (parcels of information about sequence segments based on laboratory experiments or computer analysis). The sequence is represented at top (off the screen here), with the 5' end at left and the 3' end at right.
- the annotations are tiled beneath it, each bar color representing a different type (e.g., gene, exon, intron, transcription binding site, and region of high similarity to another sequence).
- the tool lets users scroll and zoom the display and view detailed description of annotations.
- all visible bars represent regions of high similarity to other sequences (i.e., all are of the same type; the ones of darker color are simply selected).
- the SimilaritySearcher 104 launches searches of single query sequences against large databases of "background sequences" and displays search results.
- Figures 4, 5, and 6 show the SimilaritySearcher's results browser (i.e., the search has already been launched and the results returned).
- the top pane 106 of the browser is a summary table of the search results, each row of which represents a single matching sequence from the background database.
- the bottom pane 108 displays details about the first selected match, including an alignment of the query sequence and background sequence in the regions of high similarity.
- the search was executed using a popular program called BLAST.
- SimilaritySearcher and SequenceViewer are separate and have no direct dependencies on one another; nevertheless, they appear to the user to be closely integrated.
- the SimilaritySearcher 104 was invoked from the SequenceViewer 100, which passed it the text of the displayed sequence for use as a query sequence (using the present invention, the user began in the SequenceViewer 100, rather than with a text file as in figure 1.
- the annotations 112, visible in figure 5 in SequenceViewer 100 were not originally present; they reflect the similarity search hits in the other component, and only appeared when the search returned and the browser appeared. Selection and visibility of the search hits are synchronized in the two components.
- the pop-up menu 114 obscuring part of the SimilaritySearcher 104 in figure 5 appears when the user clicks the right button of the mouse after selecting several of the rows in the search-results table 106.
- the options appearing in this menu 114 are dynamically generated and can be restricted to the set of components that a user installs and registered with the integrated system security controller. Using a process denoted as Interactive Discovery, registered components are interrogated about the selected data set. Those that respond that they can operate on the data set are represented in menu 114 with appropriate descriptions.
- Interactive Discovery is an important aspect to the present invention. This mechanism encourages an exploratory mode of usage in which new paths through the bioinformatics system emerge dynamically, according to selected data and what components are present. If the user selects the option to "Perform multiple sequence alignment" from menu 114, he is presented first with interface window 116 of figure 6 to a component program that computes a multiple alignment of the selected sequences. This, in turn invokes interface window 118 to a component program that provides an editable display of the alignment. [0071] This example illustrates how various components in this embodiment of the invention interact in an integrated fashion.
- the program invoked through window 116 of figure 6, which is available as a computer program written in C, is wrapped with a service provider, a component that registers abstract services with the integrated system but hides how they are implemented.
- the program invokes through window 118 of figure 6, available as a Java application is wrapped with a simple Java class that makes it appear to the integrated system like any other client. This Java wrapper handles the broadcast and reception of integrated system events and transmits them in terms the program can understand.
- the full-length sequences required for the multiple sequence alignment not available in SimilaritySearcher 104 (it knows only of segments of background sequences), are retrieved from a public database of DNA sequences. This occurs transparently to the user by way of another application service provider that acts as a proxy to a resource available on the Internet.
- Figure 11 illustrates one embodiment of an integrated data model useful with the invention.
- Figure 11 is a composite of four separate high-level data models represented in figures 7, 8, 9, and 10. Fundamental linkages and relationships are highlighted in figure 11 with thicker lined representations.
- Metabolite shows as a link between the data model for Metabolic Pathways of figure 9 and the data model for Gene Expression illustrated in figure 10.
- FIG. 7 illustrates a data model for a Map Subsystem.
- This subsystem deals with high-level linear representations of genetic information, e.g. chromosome-level descriptions of the placement of genetic markers.
- Maps 40 and MappableObjects 42 are basic entities.
- a Map represents a linear segment of a biological macro-molecule such as DNA or protein which may or may not be characterized in terms of the exact sequence of its subunits, but upon which entities that are "mappable" (by some method such as recombinant crossing experiment) have been assigned location.
- a MappableObject 42 is an abstraction representing anything that can be located on a Map, e.g. genetic markers, QTLs, genes.
- a MappableObject 42 is also the linkage point between the Map subsystem, and other subsystems, whose elements may be "mappable.”
- MappableObject can appear on many different Maps, and a Map is comprised of many different MappableObjects. This relationship is affected through the MappedObject 44 class, which assigns location to a MappableObject in the context of a given Map.
- Maps may be recursively constructed out of other maps.
- Maps may be recursively constructed out of other maps.
- MappedObjects on different Maps may be related to one another by having their Maps located with respect to one another on another Map.
- a Chromosome can be a Map comprised of smaller Maps of regions that have been individually characterized.
- the objects in the individual regions can be related to the Chromosome by means of their location on their own Map, and its location on the Chromosome map.
- a genetic map 46 represents distances in terms of recombination probabilities, while a physical map 48 represents distances in terms of base pairs.
- a physical map 48 represents distances in terms of base pairs.
- all sequences inherit from PhysicalMap to allow their features to be specified as MappedObject with base pair coordinates.
- Figure 8 illustrates the Sequence subsystem 50 data model.
- This subsystem deals with genetic and protein information in terms of its encoding as a linear string of characters. Its major entities are the sequences themselves (DNA 56, RNA 54, and AA 58) and characterizations of functional features of sub-spans of these sequences, such as genes, motifs, repeats, promoters and binding sites.
- Another major responsibility of the Sequence subsystem 50 is the representation of homology relationships 52 between sequences. In general, these are based on alignments of sets of sequences, however other subsystems will generally only be interested in the existence and level of homology between entities that have sequence-based representations.
- Sequences are considered as a kind of PhysicalMap, and their Features are MappedObjects that are located with respect to the Sequences as MappedObjects.
- MappedObjects are MappedObjects that are located with respect to the Sequences as MappedObjects.
- Sequence subsystem Some of the details that are managed by the Sequence subsystem, but hidden from view of other subsystems include the representation of confidence in the characters of the sequence, as well as other details relating to sequencing experiments such as the libraries used to obtain the clone.
- Another major detail that is best hidden from other subsystems is the fact that entities such as genes are often represented by multiple sequences, and are typically either only partially represented by or represented as a subspan of a given Sequence.
- these facets of the sequencing process are hidden from other subsystems, so that if another subsystem asks for the "sequence" of a given gene, it may receive a consensus sequence representing a composite view of the subspans of all sequences that include this gene.
- FIG. 9 illustrates the high level data model for the Metabolic Pathways 60 subsystem.
- This subsystem deals with metabolism, i.e., the networks of transformations of chemical compounds that are catalyzed by genetically encoded enzymes. Its basic entities are Metabolites 62, MetabolicSteps 64, Catalysts 66, and MetabolicPathways 60. Metabolites are chemical compounds that are transformed into other compounds by MetabolicSteps. MetabolicSteps may be spontaneous, or they may require Catalysts to proceed under the conditions found in living cells. These Catalysts are proteins 68 that are encoded by genes.
- MetabolicPathways 60 are connected sets of MetabolicSteps 64, in which the products of one step become the substrates of another step, and so on. These networks of chemical transformations can be quite complex, and can include MetabolicSteps 64, which transport Metabolites 62 to different locations in the organism.
- the Metabolic Pathway subsystem represents metabolic potentials that exist in organisms; that is, its MetabolicSteps 64 represent chemical transformations that can take place under the appropriate conditions (e.g., if substrates are present and enzymes are expressed), and MetabolicPathways 60 represent networks of reactions that can exist, provided that all the reactions do take place under the same conditions. This information is of interest in itself and has application to fields such as Metabolic Engineering. However, it is also useful to assess which reactions and which pathways actually do exist in specific organisms under specific conditions. Although the Pathway subsystem does make some attempt to represent this information, complete information regarding the state of living systems currently is conceived as belonging to the Expression subsystem.
- Metabolic Pathways subsystem contains much internal detail regarding the kinetic and thermodynamic properties of reactions which are useful for modeling tools; however, this information should not be exposed to other subsystems, as it is relatively complex and of little direct usefulness elsewhere.
- Metabolic Pathways 60 links to other subsystems most strongly through Catalysts 66 that are proteins 68 (encoded by genes) represented in terms of their catalytic function. Both Catalysts 66 and Metabolites 62 will also have connections to the Expression subsystem.
- Figure 10 illustrates the data model for the Expression subsystem.
- the Expression subsystem tries to represent the state of cellular systems under specific conditions.
- state is represented in terms of levels of abundance of mRNA transcripts (GeneExpression), Proteins (Proteonics), and Metabolites (Metabolomics). These levels will be assayed via Experiments that will generate Profiles of the level of expression of these substances under various Conditions.
- a Gene Expression experiment 72 may generate Profiles for a set of genes 80 in various stages of the organism's development, or in different tissues, or as subjected to different environmental stresses.
- the Expression subsystem will tie to the Metabolic Pathways subsystem via Proteins and Metabolites, and will tie to the Sequence subsystem via Genes or Transcripts.
- each bioinformatics component interacts directly only with the IS platform, which serves as the medium for the exchange of events and services, and permits components to remain decoupled.
- the exchange of events allows components to synchronize their behavior, and the exchange of services allows them to draw on one another's capabilities to retrieve data, perform analyses, or present user interfaces.
- bioinformatic components may further comprise taxonomic information, bibliographic information, protein structure, signaling pathways, gene expression information etc. and still be considered claimed as part of the invention. Accordingly, it is not intended that the present invention be limited to the specifics of the foregoing description of the preferred embodiment, but rather as being limited only by the scope of the invention as defined in the claims appended hereto.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002232761A AU2002232761A1 (en) | 2000-11-09 | 2001-11-09 | Integrated system for biological information |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US70915800A | 2000-11-09 | 2000-11-09 | |
US09/709,158 | 2000-11-09 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2002039486A2 WO2002039486A2 (en) | 2002-05-16 |
WO2002039486A3 WO2002039486A3 (en) | 2002-09-06 |
WO2002039486A9 true WO2002039486A9 (en) | 2002-10-17 |
Family
ID=24848713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/049984 WO2002039486A2 (en) | 2000-11-09 | 2001-11-09 | Integrated system for biological information |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002232761A1 (en) |
WO (1) | WO2002039486A2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113536501A (en) * | 2021-07-19 | 2021-10-22 | 西安流固动力科技有限公司 | Distributed numerical simulation comprehensive analysis platform based on cloud computing and construction method thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5329619A (en) * | 1992-10-30 | 1994-07-12 | Software Ag | Cooperative processing interface and communication broker for heterogeneous computing environments |
US6125383A (en) * | 1997-06-11 | 2000-09-26 | Netgenics Corp. | Research system using multi-platform object oriented program language for providing objects at runtime for creating and manipulating biological or chemical data |
-
2001
- 2001-11-09 WO PCT/US2001/049984 patent/WO2002039486A2/en not_active Application Discontinuation
- 2001-11-09 AU AU2002232761A patent/AU2002232761A1/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
WO2002039486A2 (en) | 2002-05-16 |
AU2002232761A1 (en) | 2002-05-21 |
WO2002039486A3 (en) | 2002-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2293222C (en) | System for facilitating drug discovery data and method thereof | |
Lacroix et al. | Bioinformatics: managing scientific data | |
Stajich et al. | The Bioperl toolkit: Perl modules for the life sciences | |
Siepel et al. | ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources | |
US20030176976A1 (en) | Bioinformatics system architecture with data and process integration for overall portfolio management | |
Michalickova et al. | SeqHound: biological sequence and structure database as a platform for bioinformatics research | |
Goesmann et al. | Building a BRIDGE for the integration of heterogeneous data from functional genomics into a platform for systems biology | |
Barillot et al. | A proposal for a standard CORBA interface for genome maps. | |
Chen et al. | PathAligner: metabolic pathway retrieval and alignment | |
Gao et al. | Integrating biological research through web services | |
WO2002039486A9 (en) | Integrated system for biological information | |
Teo et al. | GLAD: a system for developing and deploying large-scale bioinformatics grid | |
Baldridge et al. | Management of web and associated grid technologies for quantum chemistry computation | |
Récipon et al. | The biologist and the World Wide Web: an overview of the search engines technology, current status and future perspectives | |
Gessler | ISYS (Integrated SYStem): a platform for integrating heterogeneous bioinformatic resources | |
Xie et al. | Multi-database retrieval technology on CPSE-Bio | |
Lohar | Bioinformatics | |
Baldridge et al. | The new biology and the Grid | |
Meil et al. | PIMWalker™: Visualising Protein Interaction Networks Using the HUPO PSI Molecular Interaction Format | |
Eriksson | Integration of Data From Heterogeneous Biological Databases Using COBRA and XML | |
Cargill et al. | Object-relational databases: the next wave in pharmaceutical data management | |
Reusch | a platform for the systematic analysis of enzyme sequence-structure-function relationships | |
Raj et al. | Interoperability of biological databases by CORBA | |
Cheung | Using Standards to Facilitate Interoperation of Heterogeneous Microarray Databases and Analytic Tools | |
Cheung et al. | Graphically-enabled integration of bioinformatics tools allowing parallel execution. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGES 1/11-11/11, DRAWINGS, REPLACED BY NEW PAGES 1/11-11/11; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |