WO2017011717A1

WO2017011717A1 - Domain-independent data representation using atoms, documents and connections

Info

Publication number: WO2017011717A1
Application number: PCT/US2016/042395
Authority: WO
Inventors: Bjoern Flemming BROBERG; Alberto MELACINI
Original assignee: Schlumberger Technology Corporation; Schlumberger Canada Limited; Services Petroliers Schlumberger; Geoquest Systems B.V.
Priority date: 2015-07-15
Filing date: 2016-07-15
Publication date: 2017-01-19

Abstract

Techniques for storing geographic information system (GIS) data in a domain-independent representation. The techniques can include storing GIS data in a GIS atomic units, each including a globally unique identifier and a state including a GIS feature. The techniques can include storing non-GIS atomic units each including at least one document and at least one connection, where each document of non-GIS atomic units includes a globally unique identifier and a version identifier, where each connection of the non-GIS atomic units is directed from its parent non-GIS atomic unit to a child atomic unit, is non-reflexive, and comprises a parent globally unique identifier, a child globally unique identifier, a relation identification, and version applicability information. The techniques may persist, unless subjected to garbage collection, an initial version and all subsequent versions of each document of each of the first plurality of non-GIS atomic units.

Description

DOMAIN-INDEPENDENT DATA REPRESENTATION

USING ATOMS, DOCUMENTS AND CONNECTIONS

Cross-Reference to Related Applications

[0001] This application claims priority to U.S. Provisional Patent Application Serial No. 62/192,804, which was filed on July 15, 2016 and is incorporated herein by reference in its entirety.

Background

[0002] In a repository that supports a variety of applications (potentially with different data modeling features), data may be represented in the backend in a domain-agnostic (application- agnostic) manner. A domain-agnostic manner may not have domain model specific elements. If the data is not represented in a domain-agnostic manner, the repository may be forced to change whenever there are changes in the domain modeling, which may be frequent. With this approach, the responsibility of mapping domain specific modeling to its internal (backend) representation it is left to the application/domain.

[0003] Many different solutions and technologies have been developed for this challenge, such as object-relational mapping, object oriented databases, object database management systems, and NoSQL databases. In particular, object-relational mapping (ORM) manages the mapping details between a set of objects and underlying relational databases, XML repositories or similar. ORM changes can incorporate new technology and capability and may not call for changes to the code for related applications. Object oriented databases or object database management systems (ODBMS) stores objects rather than data such as integers, strings or real numbers. NoSQL databases provide a mechanism for storage and retrieval that differs from the tabular relations used in relational databases. These databases are motivated by simplicity of design and potentially better horizontal scaling. The data structures used in NoSQL database differs slightly from those used in relational database and can be split into the following sub-categories Key -value, Graph, Document.

Summary

[0004] According to some embodiments, a system for storing geographic information system (GIS) data in a domain-independent representation is disclosed. The system includes an electronic hardware data repository communicatively coupled to a computer network, the electronic hardware data repository including at least one electronic persistent memory device; where the electronic persistent memory device stores GIS data in a plurality of GIS atomic units each including a globally unique identifier and a state including a GIS feature; where the electronic persistent memory device further stores a first plurality of non-GIS atomic units each including at least one document and at least one connection, where each document of the first plurality of non- GIS atomic units includes a globally unique identifier and a version identifier, and where each connection of the first plurality of non-GIS atomic units is directed from its parent non-GIS atomic unit to a child atomic unit, is non-reflexive, and includes a parent globally unique identifier, a child globally unique identifier, a relation identification, and version applicability information; where a second plurality of the first plurality of non-GIS atomic units each include at least one connection to a GIS atomic unit child, each of the second plurality of non-GIS atomic units including data representing a physical property value of a GIS feature of a connected GIS atomic unit child; where the hardware data repository is configured to persist, unless subjected to garbage collection, an initial version and all subsequent versions of each document of each of the first plurality of non- GIS atomic units; where the hardware data repository is configured to garbage collect at least the document of GIS atomic units that are not connected to a persisted atomic unit as a child; and where the hardware data repository is configured to generate and persist a later version of any altered atomic unit and any ancestor atomic units thereof.

[0005] Various optional features of the above embodiments include the following. The system may include a publicly-available plugin module, where a client side of the plugin module includes a domain-oriented interface, and where a server side of the plugin module includes the electronic persistent memory device. The system may include at least one electronic processor communicatively coupled to the electronic persistent memory device, where the client side of the plugin module includes controls configured to permit a client side user to cause the at least one electronic processor to execute a server side service on at least a portion of data stored in the electronic hardware repository. The version applicability information may include information identifying at least one version of a parent atomic unit for which a respective connection is valid and information identifying at least one version of a child atomic unit for which a respective connection is valid. A plurality of GIS features may be in serialized vector format. A plurality of GIS features may represent at least one of a point, polyline, polygon, or multipart feature. Each of the second plurality of non-GIS atomic units may include data representing a physical property value of pressure, temperature, flow rate, porosity, or chemical composition. The plurality of GIS atomic units and the first plurality of non-GIS atomic units may be stored in at least one of: a relational database, a graph database, a document database, or a key value storage. The GIS atomic units may further include version information. A version of a stored model consisting of a connected plurality of the atomic units may be represented by version information in a root node of the connected plurality of atomic units.

[0006] According to some embodiments, a method of storing geographic information system (GIS) data in a domain-independent representation is disclosed. The method includes accessing an electronic hardware data repository communicatively coupled to a computer network, the electronic hardware data repository including at least one electronic persistent memory device; storing, in the electronic persistent memory device, GIS data in a plurality of GIS atomic units each including a globally unique identifier and a state including a GIS feature; storing, in the electronic persistent memory device, a first plurality of non-GIS atomic units each including at least one document and at least one connection, where each document of the first plurality of non- GIS atomic units includes a globally unique identifier and a version identifier, and where each connection of the first plurality of non-GIS atomic units is directed from its parent non-GIS atomic unit to a child atomic unit, is non-reflexive, and includes a parent globally unique identifier, a child globally unique identifier, a relation identification, and version applicability information; storing, in the electronic persistent memory device, a second plurality of the first plurality of non-GIS atomic units each including at least one connection to a GIS atomic unit child, each of the second plurality of non-GIS atomic units including data representing a physical property value of a GIS feature of a connected GIS atomic unit child; persisting, unless subjected to garbage collection, an initial version and all subsequent versions of each document of each of the first plurality of non- GIS atomic units; garbage collecting at least the document of GIS atomic units that are not connected to a persisted atomic unit as a child; and generating and persisting a later version of any altered atomic unit and any ancestor atomic units thereof.

[0007] Various optional features of the above embodiments include the following. The method may further include providing a publicly-available plugin module, where a client side of the plugin module includes a domain-oriented interface, and where a server side of the plugin module includes the electronic persistent memory device. The method may further include providing at least one electronic processor communicatively coupled to the electronic persistent memory device, where the client side of the plugin module includes controls configured to permit a client side user to cause the at least one electronic processor to execute a server side service on at least a portion of data stored in the electronic hardware repository. The version applicability information may include information identifying at least one version of a parent atomic unit for which a respective connection is valid and information identifying at least one version of a child atomic unit for which a respective connection is valid. A plurality of GIS features may be in serialized vector format. A plurality of GIS features may represent at least one of a point, polyline, polygon, or multipart feature. Each of the second plurality of non-GIS atomic units may include data representing a physical property value of pressure, temperature, flow rate, porosity, or chemical composition. The plurality of GIS atomic units and the first plurality of non-GIS atomic units may be stored in at least one of: a relational database, a graph database, a document database, or a key value storage. The GIS atomic units may further include version information. A version of a stored model consisting of a connected plurality of the atomic units may be represented by version information of a root node of the connected plurality of atomic units.

[0008] The invention as claimed has many benefits when compared with prior art electronic data storage techniques. In comparison with prior art techniques, the claimed invention improves the operation of a computer system used to store data by, for example, achieving data-usage agnosticism, increased flexibility, faster search times, smaller memory requirements, and better performance in a distributed environment. Moreover, the claimed invention provides data to multiple entities in a manner that is agnostic as to how the entities utilize the data. The improvements relative to the prior art include improvements defined by logical structures and processes. In particular, the claimed invention provides a specific implementation of a solution to a problem in the software arts, namely, storing large amounts of data in a domain-independent way such that multiple entities utilizing a variety of software applications may access and use the data. Some embodiments abstract the underlying storage technologies, allowing an implementation to utilize any storage technologies available.

[0009] It will be appreciated that this summary is intended merely to introduce some aspects of the present methods, systems, and media, which are more fully described and/or claimed below. Accordingly, this summary is not intended to be limiting. Brief Description of the Drawings

[0010] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the present teachings and together with the description, serve to explain the principles of the present teachings.

[0011] Figure 1 illustrates an example of a system that includes various management components to manage various aspects of a geologic environment, according to an embodiment.

[0012] Figure 2 illustrates a simple model: Atom X consisting of three leaves ^, B, C.

[0013] Figure 3 illustrates an effect of editing Atom C.

[0014] Figure 4 illustrates an effect of removing the connection to Atom B.

[0015] Figure 5 illustrates an effect of adding a new Atom D.

[0016] Figure 6 illustrates a cumulative effect of editing/adding/deleting an Atom.

[0017] Figure 7 illustrates garbage collection once version 1 of is dropped.

[0018] Figure 8 illustrates a depth-3 model.

[0019] Figure 9 illustrates versioning a leaf-atom change.

[0020] Figure 10 illustrates versioning for a non-leaf-atom change.

[0021] Figure 11 illustrates model decomposition granularity: coarser (left) vs. finer (right).

[0022] Figure 12 illustrates a model in a multi-user scenario.

[0023] Figure 13 illustrates a model with cyclic dependencies undergoing a change.

[0024] Figure 14 illustrates a sequence of condensing a cyclic dependency in a model.

[0025] Figure 15 illustrates an atomic representation with GIS support.

[0026] Figure 16 depicts a flowchart according to various embodiments.

[0027] Figure 17 illustrates a schematic view of a computing system, according to an embodiment. Detailed Description

[0028] Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings and figures. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments. [0029] It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first object could be termed a second object, and, similarly, a second object could be termed a first object, without departing from the scope of the disclosure.

[0030] The terminology used in the description of the invention herein is for the purpose of describing particular embodiments and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms "a," "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term "and/or" as used herein refers to and encompasses any possible combinations of one or more of the associated listed items. It will be further understood that the terms "includes," "including," "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Further, as used herein, the term "if may be construed to mean "when" or "upon" or "in response to determining" or "in response to detecting," depending on the context.

[0031] Attention is now directed to processing procedures, methods, techniques, and workflows that are in accordance with some embodiments. Some operations in the processing procedures, methods, techniques, and workflows disclosed herein may be combined and/or the order of some operations may be changed.

[0032] Implementations of the present disclosure may provide a domain-independent data representation that generally combines characteristics from both graph and document oriented databases. Further, embodiments may include a unified centralized repository infrastructure in a step-wise-transition where new capabilities are added to the repository infrastructure without having to change the related client application code, and where the domain can maintain domain models without requesting changes to the common generic repository infrastructure.

[0033] The domain independent data representation may include an Atom as the smallest addressable piece of information that can be persisted and is uniquely identified by a globally unique identifier (GUTD). An Atom includes serializable information (e.g., as a Document) and Connections to other Atoms. Connections between Atoms may be non-reflexive, non-cyclic, parent-child relationships (Atoms with no children are referred to as leaf-Atoms, following the graph representation of the connections). Further, the Atoms may be considered nodes in the graphs, and the graphs may be persisted outside the Documents, allowing the traversal of the data structure without de-serializing any Document. Connections between the Atoms may be modeled as named (Attribute, Value) pairs and their aggregation (i.e., lists of-, maps of-, etc.). An Atom has a State. The Atom State may be persisted. Persisted States are immutable and uniquely identified by their GUTD and the Version. Atoms are not immutable (they can and do change), but their persisted States are always immutable. Unreferenced (i.e., no-longer required) versions are garbage-collected.

[0034] Connections are part of the Atom State, hence, are part of the persisted Atoms immutability. Persisted Atom States can be connected to other immutable persisted Atom States, which facilitates referential integrity. Documents and Connections may be persisted differently according to the storage medium and storage technology. Because of the immutability, the persisted Atom States can effectively be cached at different levels. In order to provide full scalability, the persisted model may be split into directed acyclic graphs (DAGs). There are several approaches to support cyclic data models. One approach is to model the entire cycles as single Atoms with multiple named documents.

[0035] Before turning to a detailed explanation of example implementations, an example context is presented in reference to Figure 1.

[0036] Figure 1 illustrates an example of a system 100 that includes various management components 1 10 to manage various aspects of a geologic environment 150 (e.g., an environment that includes a sedimentary basin, a reservoir 151, one or more faults 153-1, one or more geobodies 153-2, etc.). For example, the management components 110 may allow for direct or indirect management of sensing, drilling, injecting, extracting, etc., with respect to the geologic environment 150. In turn, further information about the geologic environment 150 may become available as feedback 160 (e.g., optionally as input to one or more of the management components 110).

[0037] In the example of Figure 1, the management components 110 include a seismic data component 112, an additional information component 114 (e.g., well/logging data), a processing component 116, a simulation component 120, an attribute component 130, an analysis/visualization component 142 and a workflow component 144. In operation, seismic data (e.g., geographic information system, or "GIS", data) and other information provided per the components 112 and 114 may be input to the simulation component 120.

[0038] In an example embodiment, the simulation component 120 may rely on entities 122. Entities 122 may include earth entities or geological objects such as wells, surfaces, bodies, reservoirs, etc. In the system 100, the entities 122 can include virtual representations of actual physical entities that are reconstructed for purposes of simulation. The entities 122 may include entities based on data acquired via sensing, observation, etc. (e.g., the seismic data 112 and other information 114). An entity may be characterized by one or more properties (e.g., a geometrical pillar grid entity of an earth model may be characterized by a porosity property). Such properties may represent one or more measurements (e.g., acquired data), calculations, etc. An entity may be GIS data, for example.

[0039] In an example embodiment, the simulation component 120 may operate in conjunction with a software framework such as an object-based framework. In such a framework, entities may include entities based on pre-defined classes to facilitate modeling and simulation. A commercially available example of an object-based framework is the MICROSOFT^® .NET^® framework (Redmond, Washington), which provides a set of extensible object classes. In the .NET^® framework, an object class encapsulates a module of reusable code and associated data structures. Object classes can be used to instantiate object instances for use in by a program, script, etc. For example, borehole classes may define objects for representing boreholes based on well data.

[0040] In the example of Figure 1, the simulation component 120 may process information to conform to one or more attributes specified by the attribute component 130, which may include a library of attributes. Such processing may occur prior to input to the simulation component 120 (e.g., consider the processing component 116). As an example, the simulation component 120 may perform operations on input information based on one or more attributes specified by the attribute component 130. In an example embodiment, the simulation component 120 may construct one or more models of the geologic environment 150, which may be relied on to simulate behavior of the geologic environment 150 (e.g., responsive to one or more acts, whether natural or artificial). In the example of Figure 1, the analysis/visualization component 142 may allow for interaction with a model or model-based results (e.g., simulation results, etc.). As an example, output from the simulation component 120 may be input to one or more other workflows, as indicated by a workflow component 144.

[0041] As an example, the simulation component 120 and/or associated domain may include one or more features of a simulator such as the ECLIPSE™ reservoir simulator (Schlumberger Limited, Houston Texas), the INTERSECT™ reservoir simulator (Schlumberger Limited, Houston Texas), etc. As an example, a simulation component, a simulator, etc. may include features to implement one or more meshless techniques (e.g., to solve one or more equations, etc.). As an example, a reservoir or reservoirs may be simulated with respect to one or more enhanced recovery techniques (e.g., consider a thermal process such as SAGD, etc.).

[0042] In an example embodiment, the management components 110 may include features of a commercially available framework such as the PETREL^® seismic to simulation software framework (Schlumberger Limited, Houston, Texas). The PETREL^® framework provides components that allow for optimization of exploration and development operations. The PETREL^® framework includes seismic to simulation software components that can output information for use in increasing reservoir performance, for example, by improving asset team productivity. Through use of such a framework, various professionals (e.g., geophysicists, geologists, and reservoir engineers) can develop collaborative workflows and integrate operations to streamline processes. Such a framework may be considered an application and may be considered a data-driven application (e.g., where data is input for purposes of modeling, simulating, etc.).

[0043] In an example embodiment, various aspects of the management components 110 may include add-ons or plug-ins that operate according to specifications of a framework environment. For example, a commercially available framework environment marketed as the OCEAN^® framework environment (Schlumberger Limited, Houston, Texas) allows for integration of addons (or plug-ins) into a PETREL^® framework workflow. The OCEAN^® framework environment leverages .NET^® tools (Microsoft Corporation, Redmond, Washington) and offers stable, user- friendly interfaces for efficient development. In an example embodiment, various components may be implemented as add-ons (or plug-ins) that conform to and operate according to specifications of a framework environment (e.g., according to application programming interface (API) specifications, etc.). [0044] Figure 1 also shows an example of a framework 170 that includes a model simulation layer 180 along with a framework services layer 190, a framework core layer 195 and a modules layer 175. The framework 170 may include the commercially available OCEAN^® framework where the model simulation layer 180 is the commercially available PETREL^® model-centric software package that hosts OCEAN^® framework applications. In an example embodiment, the PETREL^® software may be considered a data-driven application. The PETREL^® software can include a framework for model building and visualization.

[0045] As an example, a framework may include features for implementing one or more mesh generation techniques. For example, a framework may include an input component for receipt of information from interpretation of seismic data, one or more attributes based at least in part on seismic data, log data, image data, etc. Such a framework may include a mesh generation component that processes input information, optionally in conjunction with other information, to generate a mesh.

[0046] In the example of Figure 1, the model simulation layer 180 may provide domain objects 182, act as a data source 184, provide for rendering 186 and provide for various user interfaces 188. Rendering 186 may provide a graphical environment in which applications can display their data while the user interfaces 188 may provide a common look and feel for application user interface components.

[0047] As an example, the domain objects 182 can include entity objects, property objects and optionally other objects. Entity objects may be used to geometrically represent wells, surfaces, bodies, reservoirs, etc., while property objects may be used to provide property values as well as data versions and display parameters. For example, an entity object may represent a well where a property object provides log information as well as version information and display information (e.g., to display the well as part of a model).

[0048] In the example of Figure 1, data may be stored in one or more data sources (or data stores, generally physical data storage devices), which may be at the same or different physical sites and accessible via one or more networks. The model simulation layer 180 may be configured to model projects. As such, a particular project may be stored where stored project information may include inputs, models, results and cases. Thus, upon completion of a modeling session, a user may store a project. At a later time, the project can be accessed and restored using the model simulation layer 180, which can recreate instances of the relevant domain objects. [0049] In the example of Figure 1, the geologic environment 150 may include layers (e.g., stratification) that include a reservoir 151 and one or more other features such as the fault 153-1, the geobody 153-2, etc. As an example, the geologic environment 150 may be outfitted with any of a variety of sensors, detectors, actuators, etc. For example, equipment 152 may include communication circuitry to receive and to transmit information with respect to one or more networks 155. Such information may include information associated with downhole equipment 154, which may be equipment to acquire information, to assist with resource recovery, etc. Other equipment 156 may be located remote from a well site and include sensing, detecting, emitting or other circuitry. Such equipment may include storage and communication circuitry to store and to communicate data, instructions, etc. As an example, one or more satellites may be provided for purposes of communications, data acquisition, etc. For example, Figure 1 shows a satellite in communication with the network 155 that may be configured for communications, noting that the satellite may additionally or alternatively include circuitry for imagery (e.g., spatial, spectral, temporal, radiometric, etc.).

[0050] Figure 1 also shows the geologic environment 150 as optionally including equipment 157 and 158 associated with a well that includes a substantially horizontal portion that may intersect with one or more fractures 159. For example, consider a well in a shale formation that may include natural fractures, artificial fractures (e.g., hydraulic fractures) or a combination of natural and artificial fractures. As an example, a well may be drilled for a reservoir that is laterally extensive. In such an example, lateral variations in properties, stresses, etc. may exist where an assessment of such variations may assist with planning, operations, etc. to develop a laterally extensive reservoir (e.g., via fracturing, injecting, extracting, etc.). As an example, the equipment 157 and/or 158 may include components, a system, systems, etc. for fracturing, seismic sensing, analysis of seismic data, assessment of one or more fractures, etc.

[0051] As mentioned, the system 100 may be used to perform one or more workflows. A workflow may be a process that includes a number of worksteps. A workstep may operate on data, for example, to create new data, to update existing data, etc. As an example, a may operate on one or more inputs and create one or more results, for example, based on one or more algorithms. As an example, a system may include a workflow editor for creation, editing, executing, etc. of a workflow. In such an example, the workflow editor may provide for selection of one or more predefined worksteps, one or more customized worksteps, etc. As an example, a workflow may be a workflow implementable in the PETREL^® software, for example, that operates on seismic data, seismic attribute(s), etc. As an example, a workflow may be a process implementable in the OCEAN^® framework. As an example, a workflow may include one or more worksteps that access a module such as a plug-in (e.g., external executable code, etc.).

[0052] As used herein, the term "domain" means a software application or software suite that accepts data for processing. Thus, the above includes descriptions of several domains. In general, the term "framework" is synonymous with "domain" as used herein. The framework with which simulation component 120 operates, Microsoft's .NET, Schlumberger Limited' s PETREL^®, Schlumberger Limited' s OCEAN^®, Schlumberger' s STUDIO E&P KNOWLEDGE ENVIRONMENT™, and framework 170 of Figure 1 are all examples of domains.

[0053] Further, the above description of Figure 1 embraces many components that may benefit from example implementations. Such components include, for example, seismic data 112, other information 114, entities 122, attributes 130, domain objects 182, and data source 184. These components may be altered to benefit from disclosed implementations, or replaced by disclosed implementations. In more detail, the data stored in any of these components may be stored instead in disclosed embodiments, thereby ensuring that the stored data may be used by multiple domains.

[0054] Examples of Data Representation According to Some Implementations

[0055] A few objects, different in nature, will be considered and represented: diagrammatically, using Directed Acyclic Graphs (a.k.a. Acyclic Oriented Graphs) and in a relational database (using tables). Moreover, the following discloses how operations on the object will be performed, while showing how its representation changes accordingly.

[0056] Notational conventions may include that a round element indicates an Atom with its Document part; an oriented arc, sometimes represented as an arrow "- ", represents the parent-to- child connection between two Atoms; and labels on the arcs identify a logical grouping of the Connections to the child Atoms, from the parent perspective. Parents of parents are referred to as "grandparents", and parents of grandparents are referred to as "great-grandparents". Parents, grandparents, great-grandparents, and so on, are referred to as "ancestors".

[0057] Example 1: Persistence of a simple Model in a Relational Database

[0058] Figure 2 illustrates an example of a simple model. Define and persist object X 202 and related objects (e.g., parts of X 202) A 204, B 206 and C 208. X 202 and its constituents are persisted with their initial version set to 1. As described later herein, A 204, B 206, and C 208 may be geographic information systems (GIS) data, and X 202 may represent, for example, a project, a wellsite, a container, a model, a country, a region, an oilfield, etc. Note that although Figure 2 is depicted in terms of a DAG, the same information can be represented in a relational database by means database tables, as elaborated presently.

[0059] In the following examples, two tables (ATOM and CONNECTIONS) are sufficient to express the intended concepts. In a real implementation, a more articulated data structure may be provided. When describing more advanced concepts, the set of columns in the two tables may be augmented accordingly. Field types are purely symbolic (no assumptions on int, char, bool, etc.)

[0060] The tables are: ATOM, containing information about atoms A, B, C and X (i.e. the graph nodes), with one row per persisted atom; and CONNECTION representing the parent to child relationship between X (parent) and A, B, C (children), with one row per atom connection.

ATOM has the following columns:

• GUID: Atom Global Unique ID

• VERSION: uniquely identifies the persisted State of the atom GUID

• DATA: serialized information containing the atom Document

• REF MASK: indicates that additional information is needed (from other tables, if any). Its bit-configuration indicates what the additional information required is.

CONNECTION has the following columns:

• PARENT GUID

• FROM VERSION: parent Atom version from which this connection is valid

• TO VERSION: parent Atom version until which this connection is valid

• PROPERTY NAME: logical grouping to which this connection belongs - it identifies a relation between atoms

• PROPERTY KEY: when non-NULL indicates that PROPERTY NAME is of Dictionary/Map type

• CHILD GUID : GUID of the atom pointed by P ARENT GUID

• CHILD VERSION: VERSION of the atom pointed by P ARENT GUID

[0061] TO VERSION is initially NULL, indicating that it will be shared by the versions of PARENT GUID starting from FROM VERSION. A non-NULL value shows that the connection in no longer valid for higher version of PARENT GUID. [0062] This may be an enhanced versioning scheme, whereby different versions of the object share their common parts, without unnecessary cloning of information.

[0063] The above gives a further advantage in the likely cases when only a few connections are changed from one version to the next, e.g., typical in large objects consisting of several connected atoms. The above description results in ATOM:

and CONNECTION (some column name are shortened for ease of presentation):

[0064] From the above tables, the following considerations stem: In ATOM, X contains references to CONNECTION, unlike A, B, C (leaf-nodes). Because the entire aggregate is persisted for the first time, the version of its components is set to 1. There's a one-to-many relationship between X and {A, B}, i.e., l_a. lb is a one-to-one relationship between X and C. KEY is NULL. The relations in this case (one-to-one, many-to-many) are key-less. Operations on X and its representation changes, both diagrammatically and in the database tables, may be carried out. After analyzing the effect on X of each individual operation, the cumulative effect (i.e., side-effect) of several consecutive operations may be also shown.

[0065] Elementary Operations

[0066] Three elementary operations applied to the presentation of Figure 2 are presented presently: Edit an element of X, Remove an element of X, Add a new element to X. Because they are three disjoint operations, each one, applied to X version 1, will produce X version 2.

[0067] First Elementary Operation Example: X₂ = Xi.Edit(C)

[0068] Figure 3 illustrates an effect of editing Atom C 208. Note that as used herein, versions are denoted as subscripts. Because of the immutability of persisted Atoms, the result of the operation is of leaving untouched the previous (version of) C 208 and create a new version: Ci (208) - C₂ (312). However, Ci 208 still remains in the system, as other parties/users may be referring to it (e.g., Xi 202). Because X no longer connects to Ci 208, X₂ 310 is generated. The resulting graph is shown in Figure 3. The resulting tables are as follows:

ATOM

C

[0069] CONNECTION shows that the original lb connection (oriented arc Xi- Ci in the diagram) is no longer valid after version 1 of X.

[0070] Second Elementary Operation Example : X₂ = Xi.Remove(B).

[0071] Figure 4 illustrates an effect of removing the connection to Atom B 206. From previous considerations, X₂ 410 diagram is shown in Figure 4. Likewise, for the tables:

ATOM

CONNECTION

[0072] Third Elementary Operation Example: X₂ = Xi.Add(D) [0073] Figure 5 illustrates an effect of adding a new Atom D 512. Finally, the case when X₂ 510 is the result of aggregating a new element (D 512) to Xi 202, as shown in Figure 5.

ATOM

CONNECTION

[0074] A new element (D 512 at version 1) has been added to X 202, 510: the connection X₂ 510 Di 512 is valid from version 2 of X 510 (see bottom row of the above CONNECTION table). Moreover, from a modeling point of view, no new relation was introduced by the addition of D 512, but the pre-existing l_a has been modified.

[0075] Cumulative Effect

[0076] Figure 6 illustrates a cumulative effect of editing/adding/deleting an Atom. In general, the elementary operations above described are likely to be carried out by several different actors, at different stages. One scenario is when a dataset is accessed by different users, each working on their own individual projects. Consider the case where the three elementary operations described above are carried out in succession on X - bringing it from version 1 to version 4, as shown in Figure 6 and the following tables. That is, Ci 208 is edited, resulting in C₂ 616 and X₂ 610; the connection to Bi 206 is removed, resulting in X₃ 612; and Di 618 is added, resulting in X₄ 614. ATOM

C 1 < ..Document.. > NULL

C 2 < ..Document.. > NULL

D 1 ..Document.. > NULL

C

[0077] At this point, the information pertaining to X history is stored: Xi 202, X₂ 610, X₃ 612, and X₄ 614 can be reconstructed by scanning the relevant portions of ATOM and CONNECTION.

[0078] Atom Deletion and Garbage Collection

[0079] Persisted Atoms may not be explicitly deleted but (some of) their versions may become unreferenced as a result of the operations on the data. Elements may be tagged for garbage collection in accordance with a specified garbage collection policy. Examples of garbage collection policies include retaining the latest version of an Atom, keeping versions that are referenced (drop unreferenced ones), use of "labeling" to explicitly mark a (sub-)model to prevent it from being garbage collected, and the like.

[0080] The effects of garbage collecting an Atom State are several. They include removing the Document at the given version, and remove Connections to child Atom States (note parent Atoms own the child-connection(s)). Further, according to the adopted policy, some Atom States are marked for garbage collection, for example, orphan Atom States (i.e., Atom States that are not referenced by any other (persisted) Atom State).

[0081] Figure 7 illustrates garbage collection once version 1 of X is dropped. Select the following garbage collection policy: "unreferenced Atom States are tagged for collection unless PreventCollection(GUID, Version, TRUE) is called." In the scenario described in Figure 7, initially the directive PreventCollection(X, n, TRUE) is called for n = 1,2,3,4. Later, the label preventing Xi 202 from garbage collection is removed: PreventCollection(X, 1, FALSE). The consequences are shown in Figure 7. Xi 202 Document is tagged for removal. Xi 202 Connections to Ai 204, Bi 206, Ci 208 are also tagged for removal (parent Atoms own the connections). Ci 208 - being unreferenced - is marked for garbage-collection. The following tables may thus result: ATOM

on y...>

C

[0082] As may be appreciated, Xi 202 is garbage-collectable as a result of PreventCollection(X, 1, FALSE), and Ci 208 is garbage-collectable as a result of being an unreferenced orphan. The connection X₂ 710 - Ai 204 is still alive (TO VER field is empty). The connection X₂ 710 - Bi 206 cannot be dropped because valid until version 2 of X (and X₂ 710 is not collectable). The connection Xi 202 - Ci 208, however, can be dropped: it was valid until version 1 and Xi 202 is being retired.

[0083] The connection may be modeled or represented explicitly, outside the Document part of the Atom. Otherwise, Atom dependency information may be extracted from the serialized information contained in the document, and components like the Garbage Collector may not function reliably and/or efficiently.

[0084] Example 2: Persistence of Changes in a Relational Database

[0085] Figure 8 illustrates a depth-3 model. Example 1 above showed the representation (and operations on) a data structure of depth 1 where child nodes were leaves. Example 2 will show the representation of a depth 2 object X 802, where children can be either leaf (C 804) or non-leaf objects (Y 806). Following the same denotational conventions as before yields Figure 8.

ATOM GUID I VERSION I DATA I REF MASK

C

[0086] Next, consider the consequences of editing leaf- and non-leaf- Atoms on versioning and how this relates to the immutability.

[0087] Leaf-Atom Changed: Ai ~ A₂

[0088] Figure 9 illustrates versioning a leaf-atom change. As shown, if Ai 902 is changed, it generates higher-version A₂ 904. This causes the versions of parent atom Yi 806 to increment to Y₂ 906, and ancestor atom Xi 802 to increment to X₂ 908.

ATOM

CONNECTION

[0089] Non-Leaf-Atom Changed: Yi ->Y₂

[0090] Figure 10 illustrates versioning for a non-1 eaf-atom change. As shown, if non-leaf atom Yi 806 is changed, it generates new version Y₂ 1002. This causes the generation of new version X₂ 1004 of Xi 802. The tables below reflect these observations.

ATOM

CONNECTION

[0091] In Figure 10, because the connections from Y₂ 1002 to Ai 1006 and Bi 1008 have not changed, this implies that what has changed is the serialized information of Yi 806. From the two scenarios presented above in reference to Figures 9 and 10, these some considerations follow. Versioning changes propagate from child to parent (i.e., direction of dependencies) but not the other way around: changing Yi 806 changes its parent Xi 802 but not its children Ai 1006, Bi 1008. Because an Atom owns the connections to its children, when a connection changes, the owning Atom State will be different, hence it will end up with a higher version.

[0092] Atom Storage Optimization: Separation of Document and Connections

[0093] The Atomic representation of the persisted model introduced in earlier sections can be further optimized by storing the Document part of the Atom outside the Atom structure while the Atom holds a reference to the (externalized) Document.

[0094] Persisted Documents may also have immutable behavior.

[0095] In instantiations in which documents and connections are separated, the database representation of the persisted model may have an extra table (called DOCUMENT) where Documents are stored. Its fields may be: • GUTD: Atom Global Unique ID (also used to identify Document)

• DOC VER: Document specific version

• DATA: Document content as serialized information (previously part of the ATOM table).

The ATOM table may be as follows:

• GUID: Atom Global Unique ID

• VERSION: uniquely identifies the atomic structure (in conjunction with GUID above)

• DOC VER: together with GUID identifies the Atom Document (in the DOCUMENT table)

• REF MASK: (unchanged)

For such instantiations, no changes may be made in the CONNECTION table.

[0096] Because DOC VER may only increase when the Atom Document changes, DOC VER < VERSION. That is, because atoms own their connections, if a connection changes, then so too does the parent atom, which gets a higher version number. Therefore, if a connection changes and the atom's document does not, then the atom's version may change nonetheless.

[0097] Example 2a: Optimized Atom Storage

[0098] The starting structure of Example 2 (Figure 8) may be represented as:

ATOM

DOCUMENT

CONNECTION (as in Example 2)

[0099] An immediate advantage is the enhanced document storage when changes in a child node trigger a new version of its ancestors (e.g., up to the top root Atom). As an example, the "Leaf- Atom Changed: Al - A2" of Example 2 and the resulting Figure 10 may be considered. The table representation may now become:

ATOM

DOCUMENT

CONNECTION (as in Example 2)

[0100] In essence, the information that has changed is stored. Because of the enhanced document storage, garbage collection policies can be less strict: uncollected garbage becomes less cumbersome.

[0101] Another advantage is the semantic separation between the persisted Documents and their organization. Should, in the future, a richer interconnection structure be required, the DOCUMENT table might not change. For example, modeling cyclic relationships amongst Atoms may not be supported by DAGs. If a richer structure is called for, the ATOM and CONNECTION tables may be involved and the cyclically repeated Documents may be stored once. Immutability is preserved throughout. In an enhanced model, cyclic islands can easily be modelled as ATOMs with multiple named documents that can be updated individually. In some embodiments, the domain model may contain several cyclic dependencies and can be persisted as one single project ATOM with multiple documents. Non-cyclic parts can be modelled as separate DAGs referenced. The separate DAGs can very easily be shared between projects.

[0102] Domain Modeling with an Atom-based Data Representation

[0103] The data representation described earlier is completely domain independent. That is, it may not contain any domain-specific element. It is therefore up to the domain to carry out the modeling and to map it to the repository internal representation (i.e., with Atoms and Connections).

[0104] This may be a pre-requisite for some embodiments because applications may interact with the repository through a common interface that lies underneath a domain specific layer controlled by the application. Further, applications may control how their data is persisted without requiring bespoke support by the repository. In addition, a common repository may be unaware of its clients, but may provide a set of common, generic infrastructure services that is sufficient to support a heterogeneous set of applications (e.g., STUDIO® may support PETREL®, TECHLOG®, GEOFRAME®, AVOCET®, etc.) without providing domain specific features, which potentially may change whenever there is a change in the clients' domain modeling.

[0105] How can an Application carry out the mapping between the Domain Model and the Internal Representation? There may be a Repository Client (part of the backend framework) that the application uses to interact with the repository. The set of APIs offered by the client are used by the application to store and retrieve information in its internal representation. The application owns both the domain model and its mapping. [0106] This approach promotes a separation of concerns between the application/domain and the repository. In principle the domain may be focused on the modeling without suffering from limitations imposed by the backend. Further, the repository may concentrate on delivering the best software development kit (SDK) possible to allow the domain to design its models and their mapping to the backend representation. Information communicated between the client and the repository may be mainly related to exchange knowledge about changed information on demand. The client can easily verify if the repository has any updated information by a simple comparison of the root node versions.

[0107] Granularity of Model Decomposition

[0108] The granularity of the mapping may be decided by the domain. In earlier sections, an Atom was described as the smallest addressable piece of information that can be persisted. However, the Atom may not be the smallest piece of information that is modeled.

[0109] For example, consider a domain that makes use of (2D or 3D) grids. Grids are constituted by cells. A good candidate to be mapped to an Atom might be "the individual cell" or even: "the entire grid", or more generically, "a region of interest". The "region of interest" may depend on the context. The domain may decide, instead of the repository.

[0110] Figure 1 1 depicts several granularity choices. In particular, Figure 11 depicts a diagram consistent with Figure 8. The depiction may be coarsened 1102 by grouping atoms or refined 1 104 by utilizing additional atoms, as shown in Figure 11.

[0111] Interaction with the Repository

[0112] After grouping domain model elements into Atoms (according to a suitable granularity), persistence may be further considered. Without loss of generality, a Repository may be viewed as a black-box, where the data is stored/loaded into/from it.

[0113] An Atom and its Proxy

[0114] The repository may be completely agnostic to how the ATOMs and CONNCTIONs are represented by the Repository Client. The Proxy may support an explicit domain modelling of ATOMs and CONNECTIONS. Given an Atom, its "Proxy" is a container that allows for logically referring to an Atom (without, for example, holding a local copy). It is through its Proxy that an Atom is first created and later stored in the Repository. Loading/storing an Atom from/into a Repository may also happen via its wrapper: this gives support to Load-on-Demand and Incremental Updates. [0115] During Load-on-Demand, an application refers to a (persisted) Atom via its Proxy, although the data is fetched from the repository when needed. In similar fashion, when an Atom is persisted (saved) via its Proxy, the framework, in the background, identifies which of the aggregated Atoms are changed, leading to Incremental Updates.

[0116] An Atom is instantiated when its Proxy holds an Atom object (e.g., not just a logical reference). An Atom is persisted when its current State is stored in the Repository - hence it has a GUID and a Version. An Atom is defined as dirty when its current State is not persisted in the repository.

[0117] A repository can be split in to logical parts identified by Repository ID. An Atom State has Context, including GUID, Version and Repository ID. The Atom Context is used to uniquely identify and reference a persisted Atom State. This demonstrates how easily some embodiments can support more advanced concepts.

[0118] Atom Persistence via its Proxy

[0119] "Persisting a model" includes ensuring that its parts are present (i.e., "persisted") in the Repository, in other words, ensuring that dirty Atoms are persisted. In order to identify the subset of dirty Atoms, the corresponding DAG data structure is visited from the root of the (sub-) model depth-first and is persisted in the same order, i.e., children before parents.

[0120] Based on the dependency direction in the Atom-based representation, the persistence of the entire model can be robustly completed in one pass. Because the referencing Atom (i.e., Parent) depends on the referenced Atoms (i.e., Children), but not the other way around, when the parent Atom is persisted, it will connect to already persisted Atoms (its children), ensuring Referential Integrity (i.e., "no dangling connections").

[0121] The simpler Atoms are persisted first, and because persisting an individual Atom is achieved in a short transaction, the system will be loaded (intensively exercised) but not locked while a model is being persisted.

[0122] Although the scanning of the DAG occurs via the Atom Proxy, Proxys are not persisted (they are transient entities).

[0123] Dirty-ness Criteria

[0124] An Atom that is not instantiated means that the Atom is not dirty. Equivalently, a Proxy is logically referring to the Atom; hence, it cannot have changed the Atom content. An Atom that is newly-created means that the Atom is dirty. A newly-created Atom cannot have been already persisted. Atom Document OR Connections that have changed means that the Atom is dirty.

[0125] A change in Connections occurs when the connected Atom has changed, or the connection points to something different (either a different Atom or a different version of the same Atom). Checks for dirty-ness can always be done on the client side. This is particularly suitable in a distributed scenario.

[0126] Example 3: a simple Multi-user Scenario

[0127] Figure 12 illustrates a model in a multi-user scenario. As shown in Figure 12, version 1 of model X (aggregating A and B both at version 1) is fetched by two different users (User One and User Two). Later on, a third user (not represented in the Figure 12) persists a newer version of B 1202 (i.e., B₂ appears in the Repository). At this point some rules might suggest that B₂ 1202 is forced upon the two users or that they may be prevented from external changes.

[0128] For increased flexibility, assume that both users are offered the option of upgrading from Bi to B₂, when User One accepts 1204 and User Two declines 1206. When the users decide to persist their model, the check for dirty-ness is applied (from the top-root - i.e., Xi via its Proxy) with the following results: A is clean for both users (Ai is already persisted) and B is also clean for both users. User One decided to sync 1204, hence, User One is already pointing to a persisted (the latest) version of B. Further, there may be no change on User Two's model overall (Xi): User Two made no changes and declined to sync 1206. However, User One's Xi is dirty (see asterisk ' *'): even if nothing else has changed, Xi connects to a different object (B₂ instead of Bi) and because Connections are owned by the parent Atom, Xi has indeed changed. Upon persisting Xi^*, X₂ will be created.

[0129] The behavior described above is used in a system with multi-user support. Dirty-ness checking may not involve data comparison.

[0130] Incremental Updates may be perceived as being achieved by comparing what is local with what is in the Repository and copying across what is different. In a system based on immutability, the "test for changes" is replaced by a "test for presence" and the latter is achieved by Context information (e.g., Guid, Version, RepositorylD, etc.) without accessing the stored (bulk) data. Because version changes are inherited by a node's ancestors, a current version number of a model may be identified by merely viewing the version identification in the model's root node. This means that comparing the Context information of the root nodes on both sides is sufficient to detect the need to do an incremental update. This provides a great advantage in a distributed environment where any unnecessary transport of information should be avoided.

[0131] Load-on-Demand Use Case: Data Distribution in Grid Computing

[0132] Parallel data processing in clusters of computer nodes includes the execution of suitably synchronized processes concurrently running on different CPUs. A topic in this context is Data

Distribution. Different parallel frameworks may adopt different paradigms in terms of synchronous versus asynchronous distribution.

[0133] A scenario where an embodiment of the present disclosure may be applied is within a Message Passing Interface (MPI)-based framework: the synchronizing (sending) process may execute an MPI Bcast of the Context of the Atomic structure representing a data model (via Proxys). The process being synchronized may receive the information, but retrieve the actual portion of the data when called upon, e.g., asynchronously. The size of the Context may be small, making the completion of the MPI Bcast on each process (identified by its"MPI rank") relatively quickly, resulting in an almost non-blocking call. The process later decides when to retrieve the data.

[0134] Moreover, the computationally intensive processes can be scheduled topologically close to where the data is staged, reducing the data transfer bottleneck.

[0135] Incremental Update Use Case: Horizon Interpretation on large Seismic Surveys

[0136] The interpretation activity performed by a geophysicist may be localized to a specific area of interest within a potentially large seismic survey. Consequently, persisting the work carried out equates to saving a small portion of the entire dataset. Saving the work, simply involves persisting the increment of the dataset for updating.

[0137] Advantages and Limitations of a DAG Representation

[0138] Here, a DAG (Directed Acyclic Graph) is the data structure layout of choice for persisting models in the Repository. Despite coming with some trade-offs, such structure delivers considerable advantages - in particular in terms of Scalability.

[0139] The following considerations regarding dirtiness and immutability should be born in mind when deciding if supporting non-DAG structures in the persistence framework (and justifying the inherent complexity).

[0140] Dirtiness - When an Atom State changes (a.k.a., "gets dirty"), establishing the propagation of changes through the data structure becomes considerably more complex and less performant (visiting a DAG is a far simpler operation then visiting a more generic graph). Any directed graph may be made into a DAG by removing a feedback vertex set or a feedback arc set. However, the smallest such set may be difficult in practice to find.

[0141] Immutability - There may not be a restriction in preserving immutability in a non-DAG data structure.

[0142] Cyclic Dependencies

[0143] DAGs may not support cyclic dependencies, which may be perceived as a limitation. In Data Modeling, there are relationships that are cyclic. In an application domain modeling context, there are use cases to support the evidence. However, here DAGs are adopted in the persistence of the domain models and not in the actual domain modeling.

[0144] Where there is a natural mapping between domain models and DAGs, the Domain can out- of-the-box follow the modeling approach here described. In other cases some trade-offs are required.

[0145] Cycle Islands

[0146] Figure 13 illustrates a model with cyclic dependencies undergoing a change. Consider the following example where three elements (Ai, B₃ and C₄) are cyclically connected 1302. When one of A, B, or C changes, their dependency causes the other two neighbors to change as well 1304 However, the immutability of the data structure may be preserved. When one or two of A, B, or C changes, because of the considerations made in "Atom Storage Optimization," although a new State is generated for the three (A₂, B₄ and C₅), the Atoms with their Document unchanged may still refer to their original Document, reducing copies.

[0147] One approach is to leverage the control of the granularity of the Atomic representation.

[0148] Figure 14 illustrates a sequence of condensing a cyclic dependency in a model. Consider a more general structure where the cyclic reference occurs 1402. The cyclic portion of the graph can be condensed into an isle - labeled Y 1404 - to be mapped to a new Atom Z 1406. In so doing, the aggravations of dealing with non-DAGs is overcome, to the expense of a coarser Atom Y. By allowing Atom Y to have multiple named documents, the Incremental Update/Load-on- Demand capabilities of a fine-granularity Atomization can be achieved. An alternative is remodelling.

[0149] Serialized Connections [0150] Another approach is for the application to drop the cycle-generating connections from the Atom structure and store related information in the Document part of the Atom. Upon deserialization, the application may read the dropped connection information and re-hydrate the relevant domain models.

[0151] DAGs and Scalability

[0152] One area where DAGs deliver an uplift is in the scalability of the repository by supporting sharding, delivering Horizontal Scalability. Even in the presence of Cycle Islands, each island can be sharded on different storage nodes.

[0153] Application Modeling vs. Repository Representation

[0154] Several of the concepts introduced herein may have a role in the context of Repository Persistency and Domain Modeling.

[0155] Versioning

[0156] The Atom-versioning described so far belongs to the repository representation; that is, it is part of the persistency framework and has no direct implication on an application domain modeling. This versioning is due to the immutable nature of the Atom persisted State. When an application retrieves an atomic representation is to instantiate domain models, which are not immutable: applications are likely to modify their model state and re-persist it afterwards.

[0157] Some applications may support workflows where different versions of the domain models are required. This versioning system lies in the domain territory and may not be provided by a domain-agnostic repository.

[0158] Nevertheless, the versioning system associated with the Atoms can guide an application in the design of the versioning adopted in domain workflows. The Domain designs its models in such a way that, via repository provided APIs (part of the repository extensibility), a Domain model can be mapped into an Atomic repository representation. When an application changes the state of a domain model, its modifications are reflected in the structure of its Atomic representation.

[0159] Because Atom States are versioned, the state of a Domain model can be tagged according to the versions of the Atom States. Atom States, being immutable, may be available (they are removed upon garbage collection); therefore, an intermediate (persisted - A persisted DAG can be reconstructed from its root Atom) state of a Domain model can be reconstructed from the corresponding Atoms State at a given point in the workflow. [0160] Dependency Chasing

[0161] The oriented arcs in a DAG represent Parent Atom to Child Atom relationships that identify dependencies amongst the elements that are part of the persisted representation of a domain model. These are relevant, for example, for the Garbage Collector when deciding which of the unreferenced atoms can be disposed of.

[0162] Each domain model may not exist in isolation and may be connected/dependent to/on other models. In the persisted model these dependencies can be reflected in the repository as Atom connections. It may be the responsibility of the domain to ensure that the Atom connections reflect the domain dependencies.

[0163] Document Contents

[0164] The organization of the information stored in the Document part of the Atom is driven by the application, which decides the serialization format. The repository may provide extensible serialization services, where applications can register their chosen protocol (whether standard or proprietary).

[0165] Domain Modeling in a Service-Providing Repository

[0166] In some extensible, multi-tier platforms, applications can be both producers and consumers of services. In this scenario, the semantic binding between a Domain model and its Atomic representation can be implemented inside a plugin module with server-side deployment. The client-side of the plugin (exercised by the application) contains the Domain-oriented interface and the server-side contains the actual implementation of the data representation.

[0167] When services are involved, their implementation can reside on the server-side, and may be executed in the Repository where the data may be staged. In cases where the data may be upgraded/migrated, pluggable services could be a particularly convenient way to address the requirements.

[0168] GIS Feature Atom Type as part of the Atom State

[0169] GIS data may be represented as Vectors or Rasters. The Atom model may be extended to include a TYPE field. Atoms have been so far described as including a Document part and a set of Connections to other Atoms. Atoms requiring GIS support may also have a dedicated connection to GIS Feature, i.e., Atoms with TYPE= GIS T. The Connected Atoms approach for managing GIS data may leverage current GIS technologies in an optimized way. [0170] A GIS T Atom has the following properties: It contains a GIS feature (in vector format) e.g., Point, Point Set, Polyline, Polygon, etc. Further, it may be constant in the sense that its persisted State may not change. Thus, when it is first persisted it has version 1 and at such it remains. A GIS T Atom can be shared by other GIS-relevant Atoms. A GIS T Atom is reference- counted: at count zero it is tagged for garbage collection. A non-GIS T Atom can be linked to several GIS T Atoms. In the DAG representation of a model, a GIS T Atom can be a leaf-node, and, in some embodiments, only a leaf node.

[0171] As a simple conceptual example, consider the modeling of a Field, its geographical position and the way it is visualized according to zooming. A close-up may appear as a Polygon, but when zooming out it may be represented by just a Point.

[0172] Example 6: Atomic representation with GIS Support

[0173] Figure 15 illustrates an atomic representation with GIS support. In more detail, Figure 15 shows that GIS support may be added to the model of Figure 2. Because the diagram shown in Figure 15 mainly concerns with the model representation rather than its editing, the version subscripts have been omitted, assuming they are set to 1.

[0174] The extension to the model in Figure 2 includes that the non-leaf Atom X 1502 and the leaf Atoms B 1504 and C 1506 have GIS support. There are three GIS features (denoted as triangles): FA 1508 used by B 1504; FB 1510 used by C 1506; and FC 1512 shared by C 1506 and X 1502. The GIS connections (g with subscripts) represent the relationships labeled g_a, gb, and g_c. The extension described above is reflected in the database tables as follows.

ATOM

CONNECTION

GIS CONNECTION

P GUID I FROM VER I TO VER I NAME I GIS GUID

[0175] Modeling GIS data connections in this style brings several benefits. For example, Atoms are mapped to a GIS feature rather than a GIS provider/source. In the same way as Connections belong to the parent Atom, GIS Connections do too: an Atom persisted with different GIS Connections, represents a different State (increased version). Atom immutability may be preserved. Modeling GIS information outside the referencing Atom may represent an increased enhancement: several Atoms can share the same GIS feature, reducing replication of information. At least some GIS features may not be replicated, but uniquely identified by their GUID: different values of GIS GUID address different GIS features (under given geographical information). GIS information is one of the leading parameter for "search"-functionality (i.e. indexing...): having it factored out from the Document part of the Atom allows to locate information more efficiently.

[0176] Figure 16 depicts a flowchart of a method according to various embodiments. The method may be implemented in whole or in part using the hardware shown and disclosed in reference to Figure 17.

[0177] At block 1602, the method accesses an electronic hardware repository that is communicatively coupled to a computer network such as the internet. The access may be by way of physically accessing the repository (e.g., acquiring persistent storage hardware), financially accessing the repository (e.g., renting or otherwise acquiring use of the repository), or communicatively accessing the repository (e.g., configuring the repository).

[0178] At block 1604, the method stores GIS atomic units. Such GIS atomic units may be GIS T atom as disclosed herein, for example. The GIS atomic units may be stored as disclosed herein, e.g., in a plurality of tables in a relational database or in a DAG database. Regardless as to the specific database used to store the GIS atomic units, they may be formatted, updated, and processed as disclosed herein.

[0179] At block 1606, the method stores non-GIS atomic units. Such non-GIS atomic units may be non-GIS T atoms as disclosed herein, for example. Like the GIS atomic units, the GIS atomic units may be stored as disclosed herein, e.g., in a plurality of tables in a relational database or in a DAG database. Regardless as to the specific database used to store the GIS atomic units, they may be formatted, updated, and processed as disclosed herein.

[0180] At block 1608, the method performs garbage collection on the stored data. The garbage collection may proceed as disclosed herein. For example, atomic units that are not linked to any other atomic units as children may be tagged for garbage collection, and deleted as part of the garbage collection procedure.

[0181] At block 1610, the method detects an alteration to an atomic unit, and, in response, generates and persists a later version of the altered atomic unit. In addition, the method generates and persists newer versions of all atomic unit ancestors (i.e., atomic units connected to the altered atomic unit as parents, grandparents, great-grandparents, etc.). The alteration may have been made by a use accessing the stored data, for example.

[0182] Organizing the Repository: Collections

[0183] Branching is a known concept in version control systems. Collections is a form of branching where only the changed information is duplicated and the system keeps track of those changes - making it easier to harvest back potentially important results to the "main branch": The Golden Collection. Collections may be organized in hierarchies and may be identified by the Repository ID - a part of the Atoms Context mentioned earlier.

[0184] Collections can facilitate collaborations of multiple levels, but with a strong emphasis on avoiding any unnecessary duplication of information, making it easier for the end users to find the right data with the right quality. In some embodiments, Private Collections allow the user to work on their private version, but the only private information stored may be the information that the user actually has changed. All the other project information may be shared.

[0185] The Atom State Cache

[0186] For efficiency reasons, cached data may be employed. In client-server systems, data can be cached either on the remote side or locally. As an example, data can be cached on the client side and used while there is no connection to the server.

[0187] Whenever cached data is involved, a legitimate concern is whether the cached information is still valid/usable - as the original data source may have changed since the creation of the cache. Because of the immutability of the persisted Atom States an individual element may never change. If the Atom changes, there may simply be a newer (higher) version of its State. [0188] Atomic State Cache Garbage Collection

[0189] Cached Atom States versions may not indiscriminately grow, and when no longer required, may be disposed. This operation is a type of garbage collection. An effective way of handling Atomic State Cache garbage collection is by time-tagging the cached elements: upon expiration - or reached size limits (whichever is the most stringent) - cached data is purged. However, by not purging an Atomic State Cache upon client-server disconnection, the system may leverage the presence of the cached data for efficiency at the next reconnection (especially when there's is a sufficiently short time span between disconnection-reconnection).

[0190] Example Use Cases

[0191] The following presents several example scenarios for which embodiments can provide viable solutions.

[0192] Incremental Loading/Saving of Projects

[0193] Some software suites store data models primarily in the user's local hard disk, where the user's work is saved in a set of files and directories. Such projects can reach considerable dimensions, and loading an entire project in one operation can result in the application being unavailable until the full project is loaded. The user experience is obviously hindered.

[0194] Load on Demand

[0195] By modeling a project as a DAG and the models as sub-graphs, some embodiments can load at start-up a minimal amount of information. According to the user's requirements/needs, more related data (e.g., stored as connected atoms in the backend) may be loaded and represented in the application as domain models.

[0196] Incremental Updates

[0197] In a similar fashion, the user's work may be incrementally persisted. The efficient calculation of "what has changed" according to some embodiments makes this approach particularly appropriate.

[0198] Remote Storage and The Cloud

[0199] The incremental execution of the load/save operations according to some embodiments gives particular advantages when the storage is not local but remote: minimizing the data being transferred "over-the-wire" gives a clear edge. Moreover, with the ever-increasing adoption of cloud storage, these techniques may have an even bigger impact.

[0200] Lineage of a Data Model [0201] Be it a persistence data model or domain data model, it is sometimes useful to be able to reconstruct the history of a model: from its creation through its evolution.

[0202] The Connected Atoms Approach

[0203] By representing models via Connected Atoms, all versions of the model (that have not been garbage collected) may be available at any time. Although this may appear expensive, it is made possible by: (1) an efficient way of detecting what has actually changed in a model, and (2) versions of a model may be simply logical views of the data: when different versions have parts in common, they point to the same physical representation.

[0204] Native Support for Versioning

[0205] Because of the immutability of the Atoms State, any persisted model may be naturally versioned. Domains can leverage this property to implement versioning of domain models.

[0206] In contrast, a lack of native support for versioning in a data storage system can pose significant limitations. In fact, even in consolidated technologies like MONGO DB, there is no easy workaround for this missing feature.

[0207] Geographical Information System Support

[0208] As disclosed herein, some embodiments provide an innovative approach in the area of GIS support. Due to the storage independent persistence of immutable documents, the GIS information may be persisted outside the documents: this may make possible the utilization of GIS services provided by 3rd parties.

[0209] The persisted GIS information may be immutable and can be referenced by Atoms. The very same information may also be persisted once. This leads to the following advantages:

• By following the reverse reference it is possible to easily detect what information is geographically related.

• In many information technology systems, GIS processing for identifying geographical relationship between different data can be very time consuming. According to some embodiments, only incremental processing is used when a new geographical location or area is added.

• It may be easier to support user defined data selections using GIS because the total amount of persisted GIS information may now be considerably smaller.

• There may be no need to replicate GIS information in a global setting.

• The immutable GIS information may be cached everywhere, just like other Atom data. [0210] Single Document versus Multiple Documents

[0211] In some embodiments, an Atom contain a single document. Other embodiments may support multiple named documents by modelling the atom document relationships like the model of the Atom-to-Atom connections. Such embodiments may still persist each sub-document once unless its information has changed. The new version of the main document may be persisted afterwards.

[0212] Today, geologists spend a long time looking into past projects in search for the best possible data to use in their next project. Some embodiments solve this problem by presenting data in a domain-independent format.

[0213] Example Computer Hardware

[0214] In some embodiments, the methods of the present disclosure may be executed by a computing system.

[0215] Figure 17 illustrates an example of such a computing system 1700, in accordance with some embodiments. The computing system 1700 may include a computer or computer system 1701A, which may be an individual computer system 1701A or an arrangement of distributed computer systems. The computer system 1701 A includes one or more analysis modules 1702 that are configured to perform various tasks according to some embodiments, such as one or more methods disclosed herein. To perform these various tasks, the analysis module 1702 executes independently, or in coordination with, one or more processors 1704, which is (or are) connected to one or more storage media 1706. The processor(s) 1704 is (or are) also connected to a network interface 1707 to allow the computer system 1701 A to communicate over a data network 1709 with one or more additional computer systems and/or computing systems, such as 170 IB, 1701C, and/or 170 ID (note that computer systems 170 IB, 1701C and/or 170 ID may or may not share the same architecture as computer system 1701 A, and may be located in different physical locations, e.g., computer systems 1701 A and 170 IB may be located in a processing facility, while in communication with one or more computer systems such as 1701C and/or 170 ID that are located in one or more data centers, and/or located in varying countries on different continents).

[0216] A processor may include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device. [0217] The storage media 1706 may be implemented as one or more computer-readable or machine-readable storage media. Note that while in the example embodiment of Figure 17 storage media 1706 is depicted as within computer system 1701A, in some embodiments, storage media 1706 may be distributed within and/or across multiple internal and/or external enclosures of computing system 1701A and/or additional computing systems. Storage media 1706 may include one or more different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories, magnetic disks such as fixed, floppy and removable disks, other magnetic media including tape, optical media such as compact disks (CDs) or digital video disks (DVDs), BLURAY^® disks, or other types of optical storage, or other types of storage devices. Note that the instructions discussed above may be provided on one computer-readable or machine-readable storage medium, or alternatively, may be provided on multiple computer-readable or machine- readable storage media distributed in a large system having possibly plural nodes. Such computer- readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture may refer to any manufactured single component or multiple components. The storage medium or media may be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions may be downloaded over a network for execution.

[0218] In some embodiments, computing system 1700 contains one or more atom processing module(s) 1708. In the example of computing system 1700, computer system 1701 A includes the atom processing module 1708. In some embodiments, a single atom processing module may be used to perform some aspects of one or more embodiments of the methods disclosed herein. In alternate embodiments, a plurality of atom processing modules may be used to perform some aspects of methods herein.

[0219] It should be appreciated that computing system 1700 is only one example of a computing system, and that computing system 1700 may have more or fewer components than shown, may combine additional components not depicted in the example embodiment of Figure 17, and/or computing system 1700 may have a different configuration or arrangement of the components depicted in Figure 17. The various components shown in Figure 17 may be implemented in hardware, software, or a combination of both hardware and software, including one or more signal processing and/or application specific integrated circuits.

[0220] Further, the steps in the processing methods described herein may be implemented by running one or more functional modules in information processing apparatus such as general purpose processors or application specific chips, such as ASICs, FPGAs, PLDs, or other appropriate devices. These modules, combinations of these modules, and/or their combination with general hardware are included within the scope of protection of the invention.

[0221] The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. Moreover, the order in which the elements of the methods described herein are illustrate and described may be re-arranged, and/or two or more elements may occur simultaneously. The embodiments were chosen and described in order to best explain the principals of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

CLAIMS What is claimed is:

1. A system for storing geographic information system (GIS) data in a domain- independent representation, the system comprising:

an electronic hardware data repository communicatively coupled to a computer network, the electronic hardware data repository comprising at least one electronic persistent memory device;

wherein the electronic persistent memory device stores GIS data in a plurality of GIS atomic units each comprising a globally unique identifier and a state comprising a GIS feature; wherein the electronic persistent memory device further stores a first plurality of non-GIS atomic units each comprising at least one document and at least one connection, wherein each document of the first plurality of non-GIS atomic units comprises a globally unique identifier and a version identifier, and wherein each connection of the first plurality of non-GIS atomic units is directed from its parent non-GIS atomic unit to a child atomic unit, is non-reflexive, and comprises a parent globally unique identifier, a child globally unique identifier, a relation identification, and version applicability information;

wherein a second plurality of the first plurality of non-GIS atomic units each comprise at least one connection to a GIS atomic unit child, each of the second plurality of non-GIS atomic units comprising data representing a physical property value of a GIS feature of a connected GIS atomic unit child;

wherein the hardware data repository is configured to persist, unless subjected to garbage collection, an initial version and all subsequent versions of each document of each of the first plurality of non-GIS atomic units;

wherein the hardware data repository is configured to garbage collect at least the document of GIS atomic units that are not connected to a persisted atomic unit as a child; and wherein the hardware data repository is configured to generate and persist a later version of any altered atomic unit and any ancestor atomic units thereof.

2. The system of claim 1, further comprising a publicly-available plugin module, wherein a client side of the plugin module comprises a domain-oriented interface, and wherein a server side of the plugin module comprises the electronic persistent memory device.

3. The system of claim 2, further comprising at least one electronic processor communicatively coupled to the electronic persistent memory device, wherein the client side of the plugin module comprises controls configured to permit a client side user to cause the at least one electronic processor to execute a server side service on at least a portion of data stored in the electronic hardware repository.

4. The system of claim 1, wherein the version applicability information comprises information identifying at least one version of a parent atomic unit for which a respective connection is valid and information identifying at least one version of a child atomic unit for which a respective connection is valid.

5. The system of claim 1, wherein a plurality of GIS features are in serialized vector format.

6. The system of claim 1, wherein a plurality of GIS features represent at least one of a point, polyline, polygon, or multipart feature.

7. The system of claim 1, wherein each of the second plurality of non-GIS atomic units comprise data representing a physical property value of pressure, temperature, flow rate, porosity, or chemical composition.

8. The system of claim 1, wherein the plurality of GIS atomic units and the first plurality of non-GIS atomic units are stored in at least one of: a relational database, a graph database, a document database, or a key value storage.

9. The system of claim 1, wherein the GIS atomic units further comprise version information.

10. The system of claim 1, wherein a version of a stored model consisting of a connected plurality of the atomic units is represented by version information in a root node of the connected plurality of atomic units.

11. A method of storing geographic information system (GIS) data in a domain- independent representation, the method comprising:

accessing an electronic hardware data repository communicatively coupled to a computer network, the electronic hardware data repository comprising at least one electronic persistent memory device;

storing, in the electronic persistent memory device, GIS data in a plurality of GIS atomic units each comprising a globally unique identifier and a state comprising a GIS feature;

storing, in the electronic persistent memory device, a first plurality of non-GIS atomic units each comprising at least one document and at least one connection, wherein each document of the first plurality of non-GIS atomic units comprises a globally unique identifier and a version identifier, and wherein each connection of the first plurality of non-GIS atomic units is directed from its parent non-GIS atomic unit to a child atomic unit, is non-reflexive, and comprises a parent globally unique identifier, a child globally unique identifier, a relation identification, and version applicability information;

storing, in the electronic persistent memory device, a second plurality of the first plurality of non-GIS atomic units each comprising at least one connection to a GIS atomic unit child, each of the second plurality of non-GIS atomic units comprising data representing a physical property value of a GIS feature of a connected GIS atomic unit child;

persisting, unless subjected to garbage collection, an initial version and all subsequent versions of each document of each of the first plurality of non-GIS atomic units;

garbage collecting at least the document of GIS atomic units that are not connected to a persisted atomic unit as a child; and

generating and persisting a later version of any altered atomic unit and any ancestor atomic units thereof.

12. The method of claim 11, further comprising providing a publicly-available plugin module, wherein a client side of the plugin module comprises a domain-oriented interface, and wherein a server side of the plugin module comprises the electronic persistent memory device.

13. The method of claim 12, further comprising providing at least one electronic processor communicatively coupled to the electronic persistent memory device, wherein the client side of the plugin module comprises controls configured to permit a client side user to cause the at least one electronic processor to execute a server side service on at least a portion of data stored in the electronic hardware repository.

14. The method of claim 11, wherein the version applicability information comprises information identifying at least one version of a parent atomic unit for which a respective connection is valid and information identifying at least one version of a child atomic unit for which a respective connection is valid.

15. The method of claim 11, wherein a plurality of GIS features are in serialized vector format.

16. The method of claim 1 1, wherein a plurality of GIS features represent at least one of point, polyline, polygon, or multipart feature.

17. The method of claim 11, wherein each of the second plurality of non-GIS atomic units comprise data representing a physical property value of pressure, temperature, flow rate, porosity, or chemical composition.

18. The method of claim 1 1, wherein the plurality of GIS atomic units and the first plurality of non-GIS atomic units are stored in at least one of: a relational database, a graph database, a document database, or a key value storage.

19. The method of claim 11, wherein the GIS atomic units further comprise version information.

20. The method of claim 1, wherein a version of a stored model consisting of a connected plurality of the atomic units is represented by version information of a root node of the connected plurality of atomic units.