EP1537497A2

EP1537497A2 - Method for organizing a digital database in a traceable form

Info

Publication number: EP1537497A2
Application number: EP03769596A
Authority: EP
Inventors: Michel Zamfiroiu
Original assignee: Karmic Software Research
Current assignee: Karmic Software Research
Priority date: 2002-09-11
Filing date: 2003-09-09
Publication date: 2005-06-08
Also published as: WO2004025507A2; FR2844372B1; WO2004025507A3; FR2844372A1; US20060123059A1; AU2003278287A1

Abstract

The invention concerns a method for organizing a digital database in a traceable form, comprising steps which consist in modifying a main digital database by adding or suppressing or changing a record of the main base and steps which consist in reproducing the main database. The invention is characterized in that the step of modifying the main database includes an operation which consists in creating at least one digital recording comprising at least: the unique digital identifiers of the relevant records and attributes of the main database, a unique digital identifier of the state of the main database corresponding to said modification of the main database, the elementary values of the attributes assigned to them through the elementary operations, without storing the unmodified attributes and records, and adding said record in an internal historization base consisting of at least one table, and in that the reproduction step concerning any final or prior state of the main database consists in receiving (or intercepting) an original request associated with the identifier of the state concerned, in transforming said original request to constitute a modified addressing request of the historization base comprising the criteria of the original request and the identifier of the state concerned, and reconstituting the record(s) corresponding to the criteria of the original request and to the state concerned, the method further comprising mechanisms enabling the monitoring of the operations, the causal dependencies and the impact of the modifications, as well as the fusion of branches or the evolution of the data structure.

Description

METHOD FOR ORGANIZING A DIGITAL DATABASE IN A TRACEABLE FORM

The present invention relates to the field of persistent data management of an entity, for example a company. In particular, the present invention relates to the monitoring of this persistent data in a database by means of a Database Management System. It is indeed difficult for a company to guarantee the monitoring of the evolution process of strategic persistent data because this monitoring presents some objective obstacles:

• Asynchronous and collaborative nature of the process

• Very demanding nature of follow-up to constitute a real guarantee: the presence of a weak link definitely compromises the reliability of any response

• Non-availability of generic solutions to support traceability in the software layers of the market at a satisfactory level of granularity: OS, DBMS, development language

• Very high cost of rewriting existing applications and very high cost of taking explicit account of traceability by each application.

The prior art already knows from international patent application WO 9935566 a method of identifying and monitoring the development of a set of software components. The method proposed by this document of the prior art makes it possible to identify components by their name and their version. This classification at file level does not correspond to the problem of keeping traces of data continuously, that is to say at each modification of said data. In particular, the process proposed is not suitable for tracing a modified database on each write access.

It is proposed, in American patent US 5,347,653 a method providing a historical perspective to a database of objects thanks to a versioning of the stored objects as well as to a representative indexing of the objects. This prior art method proposes to store the latest version of the database in its entirety and on the other hand to store the differences to be applied to this latest version in order to obtain earlier versions. The problem with this document is the need to apply the differences one by one and in series to find the state of the database at a given date. This constraint involves a significant time cost.

The prior art also knows, from PCT patent application WO 02/27561 (Oracle), a system and a method for providing access to a temporal database. The invention described in this document relates to a system and method for selective visualization of data in temporary rows in a constant reading database. The saved transactions causing changes in the data in rows of a database are tracked and a change number of the stored system is assigned to each saved transaction. A requested selection of data values in rows from the database is performed as well as an interrogation time taking place before the save time of at least one saved transaction. The values of the data in ordered rows contained in the cancellation segments storing a transaction identifier for at least one saved transaction are recovered. The prior art also knows, from PCT patent application WO 92/13310 (Tandem Telecommunications Systems), a method for selecting and representing time-varying data from a database management system. evolving as a function of time, said method producing a unified view on a computer screen. Data from a master record for a particular entity is displayed with a default video or character attribute, and is considered to be the up-to-date record. Access to a historical record relating to this entity means that the data relating to fields which differ from the corresponding fields of the up-to-date record are superimposed on such up-to-date record fields but with a different video or character attribute. the default video or character attribute. The overlay record becomes a new updated record for other overlays. Likewise, access to a pending recording means that data relating to fields which differ from the corresponding fields of the up-to-date recording are superimposed on such up-to-date recording fields but with a video or video attribute. character different from the video attribute or default character. A plurality of historical or pending records can be composed so that all fields modified for a record set from the end of a defined period can be superimposed on an updated record at one time. Also known, from European patent application EP 0 984 369 (Fujitsu), is a mechanism for storing dated versions of data. In this storage mechanism, the data is stored as a plurality of records, each record comprising at least minus an attribute, a time marker indicating the duration for which the attribute is valid, an insertion time indicating when the record was created and a type field. The type field indicates whether the record is a concrete record, a delta record, or an archived record replacing one or more records which have been archived. Data is accessed to find an attribute value from the point of view of a "specified time", by extracting the records which have insertion times earlier than the "specified time" and constructing an attribute value from the extracted records. Data is updated only by adding concrete records or delta records, without changing attribute values in concrete records or delta records.

The present invention intends to remedy the drawbacks of the prior art by proposing a method for monitoring the evolution of data in an architecture based on a DBMS, consisting of:

• the materialization of the intermediate versions and of the data flows resulting from the operations carried out on the database, as and when it evolves, at the level of elementary granularity (recording by recording and attribute by attribute);

• the possibility of “rapid” reconstitution and restitution of any historical framework state of origin of each data version and each operation (by “rapid” we understand “without any perceptible additional time linked to the restoration”); including:

• mechanisms for reconstructing causal dependency flows (of source-destination type) between the data concerned;

• mechanisms for notifying past operations in the event of changes in the input data; • re-execution mechanisms;

and covering the following special cases and extensions:

• taking into account the structural evolution (evolution of the scheme);

• taking into account the evolution of applications;

• taking into account existing applications in a flexible architectural framework; • diagrams of the gradual evolution of an enterprise-wide architecture;

• management of virtual versions (alternative families and parallel hypotheses).

The main object of the invention is to allow the exploitation of the data of the base according to successive versions, while limiting the needs for time and storage capacity and to authorize the restitution on the fly.

A usual approach consists in recording successive versions of the databases, for example in the form of periodic storage on a medium such as a magnetic cartridge of the entire database, in the state corresponding to the current version. . The search for information requires restoration prior to the entire database, from the medium corresponding to the corresponding backup, then to interrogate the base thus restored. For large databases such as those used in banking, insurance or management, the volume corresponding to a state can exceed the terabyte, which volume should be multiplied by the number of saved states.

This solution is totally unsuitable for real-time operation.

The invention aims to respond to the technical problem of real-time operation of large volume databases.

To this end, the invention relates in its most general sense to a method of organizing a digital database in a traceable form, comprising steps for modifying a main digital database by adding or deleting or modifying of a recording of the main database and of the steps of reading the main database, characterized in that the step of modifying the main database comprises an operation of creating at least one digital recording comprising at least : the unique digital identifiers of the records and attributes concerned in the main database, a digital identifier of the state of the main database corresponding to said modification of the main database, the elementary values of the attributes which are assigned to them affected through elementary operations, without storing attributes or records unmodified, and adding said record to an internal history database composed of at least one internal history table, and in that the reading step relating to any final or previous state of the main database consists in receiving (or intercept) an original request associated with the unique identifier of the targeted state, to carry out a transformation of said original request to construct a modified request for addressing the historical database comprising the criteria of the original request and the identifier of the targeted state, and of reconstruction of the record (s) corresponding to the criteria of the original request and to the targeted state, said reconstitution step consisting in finding the elementary values, contained in the records of the database historization, corresponding to the criteria of the original request [in order to reduce the storage capacity requirements and the processing times].

According to a variant, said records from the historization database also contain references to other records from the internal database, with the aim of specifying the dynamic dependency links of source-destination type constituting the causal flow of interference between versions of data.

Advantageously, said operation for modifying the main base is a logical operation and said operation for adding to the history database consists in adding: a record identifying the state of the base corresponding to the logical operation, as many 'records as parameters of the logical operation, a record for the possible result of the logical operation and to specify by a family link the groupings of operations from the elementary level of modification to the level of the transaction, passing the number of semantic levels necessary for the applications.

According to another variant, the main database contains one or more table (s), organizing the evolution links between the identifiers of the successive and alternative states of the main database, intended to organize the records of the database. internal data.

Preferably, said one or more tables of evolution links between the states of the main database contain records specifying the rules for correspondence between the records of the internal historization database and the states of the main database. According to a particular implementation mode, said read operation consists in determining said state of the main database by referring to said identifiers and to the tables of evolution links between the states of the main database. Advantageously, an application interrogating the main database can specify the state of the desired main database.

The invention also relates to a database management architecture characterized in that said application can make modifications on any state of the main database and giving rise, in the case of the attempt to modify a state prior to, to the creation of new alternatives for base development main data whose data will be generated by the same internal historical database.

According to a variant, the dependency links serve as criteria for questioning said operations already carried out.

Preferably, the updates carried out on different branches can be integrated or merged in the context of a new state "inheriting" from said branches.

According to a particular mode of implementation, the cases of evolution of data structure of the main database are treated as particular cases of evolution of the data of said database, provided that the structure / schema of said main database be described as mentioned for the data, as a dictionary.

According to another variant, the historization database is explored and interrogated by applications through the native mode of the DBMS in order to obtain information such as for example all the historical values of an attribute and all the incidences

(dynamic) of any update and navigate through versions and dynamic dependency flows and this in a classic way, according to the query language in force, required by the DBMS.

The present invention will be better understood with the aid of the description, given below for purely explanatory purposes, of an embodiment of the invention, with reference to the appended figures:

FIG. 1 illustrates a conventional architecture of communication between an application and a database; FIG. 2 illustrates a communication architecture similar to that of FIG. 1 and comprising the elements necessary for the application of the invention; FIG. 3 illustrates the different means of access to a traceable organized database provided with a system according to the invention.

The management of persistent data of a company (or an organization in the broad sense) is generally entrusted to specific software also called DBMS. Computer applications offer users interactive ergonomic means capable of visualizing and changing the data in the company's database by communicating with the DBMS. In the following paragraphs, we recall the main characteristics of the architecture in order to position the framework of our process for monitoring data evolution and to set the minimum vocabulary. The persistence manager necessary for our system authorizes the storage of data and their reconstitution in memory in accordance with their structure

(defined as a set of attributes) and the values entered or calculated. The main Relational DBMS (but also object, network or hierarchical) on the market are good candidates for the role of persistence manager. This compatibility is also an advantage of our process which can also take advantage of the software base installed in the company. Consider for simplicity - and only as an example - the use of a Relational DBMS. This allows the representation of data in the form of tables (or relationships). The columns indicate the attributes (or fields). Each column is characterized by a field (integer, character, date, float, etc.) and other possible information such as the maximum size (for character strings). Certain attributes (one or more) constitute the key or the identifier of the record. In the following figure, we have represented a table and we have indicated the keys in underlined mode. Each row of the same table represents a new record (or tuple) of uniform structure. Each cell represents the value of the attribute. For example, "aaa" is the value of the Attributl attribute of the first record whose key is 1001. Table

The data is inserted, read, modified and deleted through a data manipulation language (SQL for example).

The persistence manager also allows the definition, the consultation and the evolution of the data structure, also called data schema. Thus, the tables can be defined, deleted or restructured. In the latter case, columns can be added or removed. Sometimes it is even useful to change the domain of an attribute, or other similar characteristics, which may involve implicit or explicit processing of the conversion of the data concerned.

Whatever the physical representation of the data, the table is the logical reference for data representation. Thus, applications generally "see" data in the form of tables. It is important to emphasize that our system wants to preserve this logical representation in order to ensure the greatest compatibility with existing applications. For example, after having requested the connection to a particular database, an application can address a persistence manager with a request of type "select * from client" and receive in exchange a set of data allowing the reconstruction of the data in tabular form.

Finally, it should be noted that a database represents a coherent state of the real world represented. The data in the database evolves in jerks triggered by events through operations (insertion, update or deletion) generally grouped into transactions. These are characterized by special properties called ACID (Atomicity, Coherence, Insulation and Durability) which guarantee a certain level of quality.

Ensuring the traceability of persistent data amounts to providing means allowing upstream and downstream monitoring of the data evolution process.

The process of data evolution is a generally unpredictable result of execution of elementary operations which read, transform and write data repeatedly, often giving rise to multiple and complex interferences which make it difficult and often impossible to follow them. Ensuring the traceability of the process amounts to being able to go back at any time to the origins (beginnings) of the process, to recover the values of the original data,. be able to monitor and understand their consequences in terms of impact of changes over the course of operations. In terms of the quality of the information, traceability is very precious because it guarantees the conformity of the result of an operation applied with the set of input data.

To better understand the extent of its scope, we present a classification of traceability according to increasingly advanced levels:

• the first level of traceability, which can be described as elementary, is that of the representation and storage of data. It is therefore a question of describing the structure, then of storing and identifying the data, whether it is an order, an article or even a mechanical part, in order to be able to find it later. . This type of functionality is already provided by specialized software, called Database Management Systems (DBMS). The process of evolution is manifested by the successive application of elementary operations such as reading, insertion, updating and deletion. These elementary operations are generally grouped into transactions in order to maintain the consistency of the data under conditions of concurrent use or even fault recovery. At this level, the updates have as a natural consequence the loss of existing values following their replacement by new values, since - by convention - to the same identifier can only correspond to a single datum (with its attributes). This first so-called elementary level of traceability is essential but largely insufficient. the second level of traceability authorizes data to have several versions at the same time (separate values). This improves traceability since it becomes possible to have at any time both the values preceding and those following the execution of an operation or a process, which makes it easier to understand the evolution. The version introduces a precious quality, since irreversibility is no longer inescapable (data evolution is allowed, without loss of current values). In addition to successive versions, there are alternative versions. It often happens that a user - after going up the thread of a process - wants to make some changes to the previous state of the data. In these cases, the versioning mechanisms allow the consideration of alternatives, or branches of evolution which allow several possible sequences from the same state of the database. An advanced traceability system must therefore integrate this aspect, especially since a new branch makes it possible not to destroy the previous ones and thus preserve the traceability of the previous processes. There are many works that take into account data whose values change over time. The domain of temporal databases clearly distinguishes the axis of validity time from that of transaction time. The period of validity makes it possible to specify, for example, that a tariff is valid for such and such a date. This information is completely independent of the date of the update of the data which stores it in the database and which is located in the so-called transactional time. Due to the specific nature of their issues, the mechanisms for taking into account the time of validity include solutions query and update [Publication by R. Snodgrass “The temporal query language Tquel” ACM Transactions on database Systems, Association for Computer Machinery. New York, USA], offer operators dedicated to taking into account intervals (between, before, etc.), and specifically deal with cases of updating time intervals for the same data which involve merging or division [European patent application EP 0 984 369 (Fujitsu)]. Furthermore, the representation and display of the different versions in turn requires specific solutions [PCT patent application WO 92/13310 (Tandem Telecommunications Systems)] which facilitate understanding of the evolution of individual data, without being concerned with branches or the global criterion of collective coherence of the database data in the versioning space. Indeed, these aspects are outside the problem of traceability which maintains vis-à-vis the version a series of requirements which are specific to it and which always remain unresolved. Finally, let us cite archiving and restoring as mechanisms allowing to find previous states of the database. On the other hand, it is obvious that they are inadequate in the face of the problem of traceability, for reasons of excessively coarse granularity of monitoring the evolution, generating insoluble drawbacks in response time and storage space. In conclusion, versioning is also essential to ensure traceability, but remains, as we will see below, still insufficient.

a third level of traceability is that of operations. Tracing an operation is like leaving a persistent trace of the execution of said operation, allowing an even better understanding of how the data evolves. We can thus better explain the evolution of an order between two versions, if we know for example that there was a discount operation on the total price. Most DBMS have logging mechanisms that allow viewing of operations performed at the elementary level. To be understandable by users, this information must be correlated with high-level operations, but the basic problem is that the log entries do not have the same persistence cycle as the data. Thus, the log is generally located outside the database and is regularly purged by the administrator. PCT patent application WO 02/27561 (Oracle) provides an alternative solution to this problem, by proposing internal storage (in the database) of transactions and cancellation information of their effects (undo), which allows to recover any previous state of the database by executing in reverse order the reverse of the operations which have taken place since. Although interesting, this technique can be very expensive in terms of execution time because, to find a precise version of a data, it undoes all the operations that take place since, including those that do not concern it. In addition, it is also not suitable for obtaining the list of all the versions of a data item. Finally, it prohibits any update from a previous state of the database, which rules out variants and alternative branches of evolution. As we will see later, in the present invention, the inventors have opted for a strategy opposite: upon receipt of a request, the present invention proceeds to the transformation thereof then to an execution on the versioned data. Finally, note the need for higher level information, for example provided by applications, in order to obtain an articulation between the semantics of the applications (application of a discount on an order) and that of the DBMS (update day of the attribute "amount" of the order). the most advanced level of traceability is that of causality. It aims to materialize information transport links at the most basic level (the finest grain). For example, if any operation 0 reads the attribute A of the data X, reads the attribute B of the data Y, adds the two and stores the value thus obtained in attribute C of data Z, a causal link would be able to reconstruct this transport of information through the different versions of data X, Y and Z, as well as the various executions of operation 0. This precious information allows to understand the details of the evolutions, to explain transitively the origins of the modifications and to detect the operations which have to be redone in the event of evolution of the original data. It is especially important because - unlike journaling techniques - it gets rid of the sequential constraint of operations to focus on the dynamic dependencies generated by causation. We can thus overcome, for example, thousands of operations that do not interfere with the data that interests us. Finally, it is also extremely valuable in simplifying the merger of data located in different branches and better identify real conflicts.

A particular case of evolution operation concerns the evolution of the schema which consists in changing the structure of the data without loss of information

[Roddick93 - Publication "A taxonomy for diagram versioning based on the relational and entity relationship models"

Roddick, J.F., Craske, N.G. and Richards, T.J. 1993.]. Similarly to the data, the monitoring of the evolution of their structure will be better ensured if the versioning mechanism for monitoring operations and causal traces also applies to information describing the structure. Special measures for organizing data and metadata [Publication “Extracting delta for incremental data warehouse maintenance” Ram P et al. Data Engineering, 2000.] will be required.

One of the objectives of the present invention is to propose a slightly intrusive and progressive method of organizing a digital database in a traceable form. We aim to ensure the successive levels of traceability described above, without imposing the redesign of existing applications.

In other words, the objective pursued by the invention is to provide computer applications and their users with the ability to accurately track data throughout its evolution, by tracing their stories in a comprehensive manner, as well on the individual level (intermediate versions and links of succession) than on the collective level (triggering events and dynamic links of interdependence resulting from the interactions between the versions of the data), in the positioning it within the coherent framework of its original development.

It is therefore a question of providing causal links at an elementary level where one can easily follow the causal flow of transformations and check the validity of each intermediate operation under the base of input data, applied processing and data. resulting, so that the reconstruction of any state of the past is immediate.

In addition, the method according to the invention uses a flexible architectural framework, as unconstraining and intrusive as possible in order to provide very broad applicability to the proposed method and as wide compatibility as possible with the methods of storing and manipulating current data.

In order to monitor the evolution of a so-called “main” database, the method according to the invention makes it possible to ensure that it represents not only one but all the successive and / or alternative coherent states. necessary from the real world represented in its evolution, while preserving the ACID properties.

For this purpose, the architecture implemented for the invention is illustrated in FIG. 2 and is constituted as follows: a journal (J) organized in the form of an “internal historization database” consisting of a table or a set of tables dedicated to monitoring evolution and based on a universal storage mode with a stable schema (independent of the logical representation of application data) and particularly suitable for on-the-fly data reconstruction. a transaction and event monitor capable of detecting any request for changes in values and structure transmitted to the database, which gradually adds entries to the dedicated log characterizing the elementary evolution of the data (identity, attribute, value, triggering event and dynamic dependencies) a reconstitution module (R) on the fly of the state of the database according to a target event; the system is provided for this purpose with a cursor (C) dedicated to the selection of the desired state. special case: in some cases, it may be useful to materialize the view of the so-called "current" or "main" base in the form of specialized structure tables, for example to allow high performance and total compatibility with existing applications (in particular to allow the use of stored procedures and other triggers or triggers that an application may require to function properly).

Optionally, the architecture also includes:

• a system for monitoring compliance (SC) of the applications with the states of the database and its diagram

• automatic inoculation tools (I) in instruction applications dedicated to monitoring dynamic dependencies (capturing data flows)

The event log (J) (or the internal history database) is made up mainly from a table with a structure independent of that of the application data. The columns are:

• a unique identifier of the logical table record concerned by the journal line, belonging to the main key

• an identifier of the attribute in the schema, or 0 for the record itself, belonging to the main key

• a universal event identifier, automatically incremented, also belonging to the main log key and corresponding to the state of the main database

• a value field dedicated to the storage of values

The role of the monitor (M) is to correctly detect and interpret each change request by adding the corresponding information to the event log (J).

Examples of changes in value

Comment

- insertion of "customer" registration table ID

- update Client number of an attribute

- update Name of a client attribute

- deletion Code of deletion of record In the language of exchange with an SQL database, the first three lines of the table can be the effect of the following query:

insert into Client (client_name, client_name) values (1001, "aaa")

Such a request is treated as follows:

• parsing of the request

• retrieval from the schema of identifiers for the client table (53) as well as for the attributes "no_client" (1) and "nom_client" (2)

• insertion of the three lines in the journal

The last line can be obtained by the following statement: delete 'fro Client where No_client = 1001

Such a request is treated as follows:

• parsing of the request

• retrieval from the schema of identifiers for the client table (53) as well as for the "no_client" attribute (1).

• recovery of the identifier of the log record having the value 1001 for attribute no 1

• insertion in the log of the last line (using the code 0 for the value). Examples of schema evolution create table Client (no__client int primary key)

New Table Comment Creation

Table ID table

Name of the table

Adding an attribute attribute table ID

Attribute name

Field

Primary key table ID

alter table Client drop column no_client Deletion Code of a deletion attribute

Customer drop table

Delete Code from a delete table

Other cases: Update displacement attribute table ID

The example described above concerns a complex case, without equivalent in a single SQL operation. A On the other hand, an interactive management tool can make it possible to derive real benefit from this characteristic.

As can be seen, each event that tends to modify the logical database ends up creating one or more entries in the form of new lines (or records) in the log. This ensures that nothing is lost and that any logical deletion or update does not result in physical deletion. Thus, data from the past can be recovered. One of the advantages of this organization is the concurrent creation of views such as account books which generally block the update access of other users.

We should also note the uniformity of the information storage structure: the data is stored in identical fashion, whether it is the evolution of values or that of structures. This means that from a logical point of view, it is possible to reconstruct both logical tables and their structures, on the basis of the same mechanism. In addition, the fact of including the journal in the same database as the main database makes it possible to guarantee its relative coherence by the transaction mechanism provided by the DBMS.

The reconstruction module (R) is in charge of restoring the data in logical format according to an event type parameter, from the event log (J).

For example, consider that the application wishes to obtain the data from the Customer table as that it was right during event 854. This implies beforehand the selection of event 854 by the event cursor (C). Subsequently, the "select * from Client" request is transmitted to the DBMS but transformed by the module (R) into a more complex request, obtained in the following way:

• reconstitution of the corresponding diagram: the request relates to the Customer table; the system must therefore verify the existence of the Customer table at the historical time positioned by the target event and retrieve the attributes from this logical table; (optimization is possible by keeping the diagram in cache)

• recovery of the records whose Attribute = 0 field created and not deleted "before" the event corresponding to the target state, (value = 0 for the deletion code) and attached to this table. In the case of alternatives, "before" only concerns events located on the same branch.

• recovery of all records including the Attribute <> 0 field attached to the previous and previous ones to the target event.

• reorganization of the flow of restored data and grouping by logical recording, that is to say in our case, by client.

In one embodiment of the invention, it is possible to make modification requests on past states of the main database so as to create a tree structure of the versions of the database processed.

In addition to values and events, the journal can accommodate invocations of operations. This can be done by representing the operations under the form of logical tables, where each operation corresponds to a logical table name and each argument corresponds to a logical attribute. By applying this correspondence scheme, the application can send to the journal (for example, via an API: "Application Programming Interface") the information necessary for the traceability of calls to operation analogous to the manipulation of logical data (but this task can be automated and entrusted to a post-processor, to the compiler, to the processor or even to the virtual machine).

add (2, 8) Invoke Comment the Add operation with arguments 2 and 8

57 is operation ID the operation's “Add” identifier

"Add"

First argument

62 is Second argument the identifier of this invocation of the operation

"Add"

Return value Operation calls are used to connect the semantics of application actions to the events recorded in the log. As we will see later, this will make it easier to position the cursor on landmarks that are significant from the user's point of view.

In addition, transaction validation points can be plotted in the form of operations. Indeed, it is recommended that the cursor is positioned exactly on these points and not between two operations of the same transaction. The consistency of the result depends on it. On the other hand, applications such as design assistance tools may very well benefit from intermediate states, deemed to be incoherent, for explanatory purposes, and thus benefit from mechanisms of the “long transactions” type.

Finally, note that the operations are linked by references (not shown in the tables) to the parent operations so that we can also trace their belonging to the execution of a higher-level operation. Thus, it will be possible to reconstruct the ownership of the operations, from the elementary level of events and up to the level of transactions, passing through as many invocation levels as necessary for the applications.

The invention also relates to the materialization of causal links.

The flow of causal dependencies must be dynamically constituted by the read and update operations respecting the following rules:

Data manipulation must systematically consider alongside the data read their original references and transport them throughout of data flow and control. The application must therefore take care of this aspect, by adding to each handling instruction its equivalent of transporting references, for example via an API. Automation of this task can be achieved by a post-processor and / or by processor or virtual machine extensions.

When inserting a physical datum, the references of the flow that supplied it must be stored in the form of a list of elements of type ID-attribute- UEID, alongside the value attribute, and this for each physical log record. The following table illustrates this. An empty list would correspond to the introduction of a value from outside the system (for example, by the input made by a user through a Human-Machine Interface).

Comment

The value of attribute 4 was formed from attributes 2 and

3

The implementation of the sources in the journal can very well be carried out by the intermediary of an additional journal (or sub-table), organized in a tabular manner, and this for reasons of performance optimization, according to the techniques in force. in the discipline of databases.

The interpretation of the flow is simple: the value of a data item depends on the values of the source data read at the times referenced by the corresponding UEID events. We can therefore say that the sources materialize the elementary causal links.

The invocation of operations can be traced in the same way. Here is an example of the call to the Add operation (mentioned above) with the arguments Client.Attr3 and the constant 7.

Comment

Operation ID "add" First argument

Second argument Return value

The validity check of operations can be carried out against the data in force. For example, if the value of the Attr3 attribute of Client 110 changes after the execution of the "add" operation, the results sent by this operation can no longer be considered as compliant. It is said that there is "questioning". In the case of an evolution without alternatives, this can be verified by a simple comparison of UEID between the sources of the arguments and the last values of the referenced sources.

In order for this traceability information to be entirely effective for the user, it is useful to minimize the constants, that is to say the values entered "arbitrarily". The application must therefore favor identification systems by choice list, by pointing, by drag-and-drop, etc., or by any other technique which improves both the ergonomics of the application and which implicitly allows the ensuring continuous monitoring of the information flow. In reality, these identification techniques are widely used because they provide the benefits of static referencing expected in databases in a current manner.

This characteristic of the process also makes it possible to set up an automatic optimization system, which - on the basis of the systematic verification of the validity of the sources - makes it possible to return the result calculated previously, without actually re-executing the operation. The implementation of such a solution involves the introduction of references to the calling operations (which can be done through additional arguments) and provided that the verification time is less than that of execution (statistics of performances can be maintained for information purposes and exploited effectively).

Automatic notification of "questioning" may be implemented on the basis of the information concerning the validity of the versions of the data in relation to the flows. So, for an operation, a class operation, a target or a given source, flow consistency alerters will be able to notify the applications by synchronous or asynchronous messages.

The re-execution consists of a new explicit invocation of a given operation on the model of a previous invocation, but on the basis of new values. In all cases, it will give rise to new values for the data, the operations and the traced sources.

The method according to the invention is specially designed to manage operationally the historization over the water and the restoration on the fly. In addition, the management of storage volumes is facilitated and optimized by a set of factors:

• only attribute values that change are stored (redundancy is thus minimized)

• the necessary volumes of additional storage increase linearly with the number of attributes modified or deleted and do not depend on the volumes of data inserted in the database; this factor allows very advantageous use for a very wide spectrum of applications. • finally, the very relevant purges can be operated according to the data marked as called into question by the source-destination type traceability links, but this operation must be controlled by the applications according to the semantics of the called into question.

For reasons of simplification of the discourse, in the previous example, we made the implicit hypothesis of a sequential organization of events and therefore of the states of the main base (according to an order total). So, to check the validity of a source, we mentioned as a solution the simple comparison of universal event identifiers (UEID).

In reality, our process allows a wide choice of version organization, such as for example:

• Tree structure: each event has a parent event. The value of a datum associated with an event can be obtained by a logical ascent of the parents to the nearest value. • Graph oriented without circuit: analogous to the tree structure, this organization allows a version to have several different parents. The ambiguities of resolution can be resolved by predefined rules, based on criteria of priority of the branches or on any other characteristic of the data (its type, etc.)

The evolutions of the different branches can be merged by calling on the re-execution of operations.

Virtual versions are predefined event branches that allow the creation of parallel configurations that can simultaneously benefit from events applied to one or more so-called “reference” branches. Other features:

• The possible conflicts are avoided by the separation of the events by nature in branches of reference according to the model evoked in the organization of graph oriented without circuit.

• The materialization of these configurations is not real because the events are not physically duplicated (the propagation is logical).

The architecture implemented for the realization of the invention may also include the following modules • a system for monitoring compliance (SC) of the applications with the states of the database and its diagram. The principle is based on the registration of a version identifier of the application in order to declare a level of compatibility with the state or states corresponding to the schema of the main database

• automatic inoculation tools (I) in instruction applications dedicated to monitoring dynamic dependencies (capture of data flows): pre / post-processor or extended virtual machine

• visual components specialized in navigation and exploration of the base states (not shown).

The invention can be implemented in several ways depending on the context in which it is integrated into an application.

Figure 3 presents an architecture which allows three levels of integration of traceability, from bottom to top:

Existing applications can continue to access the (so-called “main”) database in the same way. The database can either keep its original structure and redirect access to an associated log (called internal database), or evolve towards a physical log-type organization and offer views or a driver responsible for translating requests and results . Existing applications can be very easily provided with a "cursor" provided that access to the data is centralized (which is generally the case, for example through a single driver). In this case, the application may offer automatic means of access to the data in the database (now implemented in the form of a log) and allow users to activate a cursor which will position the readings on the desired event marker. Slight adaptations may take place in order to match the granularity of the events with the semantics of the application.

The new applications, entirely built on the basis of trace generation inoculation technologies, will implicitly benefit from the most advanced level of traceability offered by this process including exhaustive monitoring of the evolution of data and their structure. For the monitoring of the evolution of the applications to be ensured at the same level, it will suffice to use declarative techniques for representing the sources, to entrust them to the same journal and to have them handled by an assembly tool provided itself. a traceability module according to this same process.

This architecture gradually achieves ever higher levels of traceability of persistent data:

• initial: representation and persistence (essential, prior), provided by the initial persistence system

• event logging (useful, short-term failure recovery, but poses a problem of rapid reconstruction of past states)

• historization and versioning (useful, because the stored values are multiple and can include variants but this functionality generates reconstitution problems in mode compatible with the initial mode)

• structural evolution: the follow-up of the evolutions of the data and the schema of the main database, compatible with the initial mode • causal dependence: the detection of dynamic dependence flows and causal links between the data in the logging database (logged)

The use of branches offers the possibility of creating alternatives for upgrading the database. At the same time, this raises new problems with regard to traceability. Indeed, supposing that after the separation of the branches A and B, the data X is modified in the branch A through the operation 0. One can then wish to return its new value in the branch B, as if it had had this value at the time of the separation of the branches. This operation, called refreshing, is very useful for many cases where institutional reference data are received at more or less regular intervals. Their integration can then pose problems of interference with the operations carried out in the meantime. For example, if no operation having as source or destination the data X in the branch B has not been carried out in the meantime, we can calmly consider that there is no impact. On the other hand, if it is the case, it will then be necessary to decide

(explicitly or implicitly) which operation has priority and redo the others. These conflicts are easily detectable thanks to dynamic dependency links. The associated semantics will be provided by the operations that caused these dependencies. A simple comparison of the universal identifier of the traces of operations makes it possible to evaluate the prior art and to confirm or invalidate it. The user (or the application, through a system of predefined rules) can thus make an informed decision. The case of branch merging is quite analogous. Note that this technique is more interesting than the early locking of data, since, in many cases, future operations are not predictable and their target data even less. The possibility of creating branches is also a means which aims at least at least temporarily to avoid conflicts and which allows postponing their resolution.

Virtual branches - which are by definition permanently refreshed by their “parent” branches - automatically benefit from data refresh in their parent branches, including disconnection operations (creation of new branches) which take place (virtually, of course ) at the same time on the virtual branches. For example, if branch B is virtual, then any operation carried out on branch A is automatically reflected on branch B. In addition, if we create a new branch A2 from A, this will have the effect creation of a similar B2 sub-branch from B. It is important to underline the virtual character of these refreshments. This means that in reality no treatment is actually carried out. The only effect is that a next request on branch B will have an enriched result (which takes into account the refreshed data). Finally, note that in the event of automatic propagation, there is no automatic conflict resolution, unless rules have been predefined. In some cases, it can be decided in advance that, by default, what has been explicitly modified in the virtual branch has priority over the data generated by refresh.

Merging complex data is both more sophisticated and more realistic since the most often the major decision criterion for choosing versions for conflict resolution is context. Consider that the data X is a command and that the data Y1 and Y2 are two of its command lines. If a new price for the article Zl is proposed in the “parent” branch, then propagated in the branch in question, we must then decide if it calls into question the value of the order X, knowing that the line Yl is just reference to article Zl. The answer will be given by the management rule in force for orders. Such a rule could be expressed for example in the following form: "if the order is in the paid state, the order remains intact, otherwise, my tariff updates apply immediately". Note that this rule does not have to take into account the notions of version, branch or even causal trace, which once again underlines the very low level of intrusion of our process.

In conclusion, the availability of causal traces makes it possible to configure in a more detailed manner the different possibilities of fusion, while scrupulously respecting the processes and while providing irrefutable proof of this respect.

The spectrum of applications of the invention covers most of the cases where it is useful to follow the evolution of persistent data, from management applications to file management systems, via design tools based on repositories, or beyond the needs for persistence, provided that monitoring the evolution is useful. The invention is described in the foregoing by way of example. It is understood that a person skilled in the art is able to carry out different variants of the invention without going beyond the scope of the patent.

Claims

1. Method for organizing a digital database in a traceable form, comprising steps for modifying a main digital database by adding or deleting or modifying a record in the main database and steps for reading of the main database, characterized in that the step of modifying the main database comprises an operation of creating at least one digital record comprising at least: the unique digital identifiers of the relevant records and attributes of the main database, a unique numerical identifier of the state of the main database corresponding to said modification of the main database, the elementary values of the attributes assigned to them through the elementary operations, without storing the unmodified attributes or records, and adding that record to a b ase of internal historization made up of at least one table, and in that the reading stage relating to any final or previous state of the main database consists in receiving (or intercepting) an original request associated with the identifier unique of the targeted state, to carry out a transformation of said original request to construct a modified request for addressing the historical database comprising the criteria of the original request and the identifier of the targeted state, and of reconstruction the record (s) corresponding to the criteria of the original request and to the state in question, said reconstitution step consisting in retrieving the elementary values, contained in the records of the historization base, corresponding to the criteria of the original request [in order to reduce the storage capacity requirements and the processing times].

2. Method for organizing a digital database in a traceable form according to claim 1, characterized in that said records of the history database also contain references to other records of the database internal, in order to specify the dynamic dependency links of source-destination type constituting the causal flow of interference between the versions of the data

3. A method of organizing a digital database in a traceable form according to any one of the preceding claims, characterized in that said operation of modifying the main database is a logical operation and that said operation of adding to the history database consists of adding: a record identifying the state of the base corresponding to the logical operation, as many records as there are parameters of the logical operation, a record for the possible result of the operation logical and to specify by a relationship the grouping of operations from the elementary level of modification to the level of the transaction, passing the number of semantic levels necessary for the applications.

4. Method for organizing a digital database in a traceable form according to any one of the preceding claims, characterized in that the main database contains one or more table (s), organizing the evolution links between the identifiers of the successive and alternative states of the main database, intended for organizing the records of the internal database.

5. Method for organizing a digital database in a traceable form according to claim 4, characterized in that said one or said tables of evolution links between the states of the main database contain records specifying the correspondence rules between the records in the internal historization database and the states of the main database.

6. Method for organizing a digital database in a traceable form according to one of claims 4 to 5, characterized in that said reading operation consists in determining said state of the main database by referring to said identifiers and tables of evolution links between the states of the main database.

7. Database management architecture using the interrogation method of any one of the preceding claims, characterized in that an application interrogating the main database can specify the state of the desired main database.

8. Database management architecture according to claim 7, characterized in that said application can operate modifications on any state of the main database and giving rise, in the case of the attempt to modify a previous state, to the creation of new alternatives of evolution of the main database whose data will be generated by the same basis of internal historization.

9. A method of organizing a digital database in a traceable form according to any one of claims 3 to 6, characterized in that the dependency links serve as criteria for questioning said operations already carried out.

10. A method of organizing a digital database in a traceable form according to any one of claims 3 to 6, characterized in that the updates carried out on different branches may be integrated or merged in the context of 'a new state "inheriting" from said branches.

11. A method of organizing a digital database in a traceable form according to any one of claims 3 to 6, characterized in that the cases of evolution of data structure of the main database are treated as particular cases of evolution of the data of said database, provided that the structure / schema of said main database is described in the manner mentioned for the data, as a dictionary.

12. A method of organizing a digital database in a traceable form according to any one of claims 3 to 6, characterized in that the historical database is explored and interrogated by applications through the native mode of the DBMS in order to obtain information such as for example all the historical values of an attribute and all the (dynamic) incidences of any update and to navigate through the versions and flows dynamic dependency and this in a conventional way, according to the interrogation language in force, required by the DBMS.