WO2021230771A2

WO2021230771A2 - Method of piece data synchronization describing a single entity and stored in different databases

Info

Publication number: WO2021230771A2
Application number: PCT/RU2021/050126
Authority: WO
Inventors: Vitaly SATTAROV; Petr Emelianov; Alexey VORONIN
Original assignee: Ubic Technologies Llc
Priority date: 2020-05-12
Filing date: 2021-05-06
Publication date: 2021-11-18
Also published as: RU2020116939A3; RU2020116939A; WO2021230771A3

Abstract

Method of piece data synchronization describing a single entity and stored in different databases that contain stages: a) the current ID of the object is distributed to all source parties involved in the computation, and to the side-operator, in this case, a side-orchestrator has no information about whether a particular side of the source data identified by the current object ID; b) each source party receiving the current identifier has all the information about its own set of identifiers; c) the source parties follow the MPC protocol and perform a computational algorithm; d) the result is formed in the form of secret shares, divided among all the source parties involved in the process; e) after the completion of the algorithm, the secret shares in the result are transferred to an independent operator party.

Description

Method of piece data synchronization describing a single entity and stored in different databases

The present invention relates to the field of computing, in particular, to a method of piece data synchronization describing a single entity and stored in different databases and controlled by different organizations in order to share all available information about this entity into secret shares between all involved organizations and to perform joint confidential calculations.

From the prior art is known a technical solution US 20190372760 A1 that describes the simplifying process of a collaborative computing for a large set of parties who share their secrets among a smaller set of parties.

However, the disadvantage of the above mentioned technical solution is the low rate of security synchronization of data parts.

In addition, from the prior art is also known a technical solution WO 2019202586 A1, that describes the optimization process, specifically the computation of the polynomial function according the MFC protocol in one round. More accurately, this process can be described as the exchange of all the data necessary for calculations in one step.

However, the disadvantage of the above mentioned technical solution is the description of the MFC protocol, but not a method of the data synchronization which is not dependent from the protocol.

From the prior art are also known technical solutions WO 2018211676 A1 and WO 2019202586 A1, describing the MFC protocols.

The difference between the claimed technical solution and those known from the prior art is that the claimed technical solution doesn’t base on the MFC protocol, since the parties can repeatedly calculate the function f(x₁,..., x_n) on different x₁...x_n, safely passing the arguments in the right order.

The closest technical solution to the claimed invention is the application WO 201392916 A1, which describes the identification of data, including the biometric data by their hinarization and filtering, implemented by MFC.

At the same time, there is no any single technical solution which discloses all features of the claimed technical solution, because none of the technical solutions uses the secure synchronization data parts, describing a single entity and stored in different databases and controlled by different organizations in order to share all available information about this entity into secret shares between all involved organizations and to perform joint confidential calculations.

The technical problem to be solved by the claimed technical solution is to create a method for securely synchronizing parts of data describing a single entity and stored in different databases, which is described in an independent claim.

The technical result is as follows: the secure synchronization of data parts describing a single entity, but stored in different databases controlled by different organizations, in order to share all available information about this entity into secret shares between all involved organizations and to perform joint confidential calculations.

In the preferred option of the technical solution, a computer-implemented method of the secure synchronization of data parts is describing a single entity and stored in different databases, containing the stages:

The preferred option of the technical solution is a computer-implemented method for the secure synchronization of piece data, describing a single entity and stored in different databases, containing the stages: a) the current identifier is sent to all side parties involved in the calculation, as well as to the operator party, while the orchestrator party does not have any information about whether a particular side party has data identified by the current identifier; 6) each side party receiving the current identifier has all the information about its own set of identifiers, and is able to check whether it includes the current identifier;

B) the side parties proceed the execution of the computational algorithm, and the result of the calculation will be the establishment of the fact of belonging or not belonging the object identified by the current identifier to any target group.

Identifiers can be represented as a sequence of characters, bits, graphics, audio information, and biometric data.

Keys that identify parts of data from one source and are suitable for intersecting with data parts from other sources may have a commercial or other value for a particular source. These keys are often the values of a hash function calculated from the phone numbers of the customers of the organization that acts as the data source. Despite the theoretical computational irreversibility of hash functions, the disclosure of the values of such hash functions generally means the disclosure of the source's contact information. On the other hand, such an intersection is necessary for joint computing and analytics, which uses data about a single object of the subject area stored in different sources.

In order to save the confidentiality of the source and ensure synchronous access to distributed data about a single object, it is proposed to use an additional side that stores all the necessary identification information and orchestrates operations on the parties data source involved in joint calculations.

Further, the implementation of the invention will be described in accordance with the attached drawings, which are presented to explain the essence of the invention and are not limiting the scope of the invention. The following drawings are attached to the application:

Figure 1 illustrates the safe union of sets;

Figure 2 illustrates an example of a general circuit of a computer device.

In the following detailed description of the implementation of the invention are provided numerous implementation details which provide a clear understanding of the present invention. However, it will be obvious to a qualified specialist in the subject area how the present invention can be used, both with and without these implementation details. In other cases, well-known methods, procedures and components have not been described in a detailed way, so as not to unnecessarily complicate the understanding of the features of the present invention.

In addition, it will be clear from the above presentation that the invention is not limited to the above implementation. Numerous possible modifications, changes, variations and substitutions that preserve the essence and form the present invention will be obvious to qualified specialists in the subject area.

The present invention refers to a computer-implemented method for secure synchronization of data parts describing a single entity and stored in different databases.

Figure 1 shows a safe union of sets:

The first stage of the collaborative computing orchestration process is the compilation of a set of all available identifiers, which combines the sets of identifiers of all objects stored by the data source parties. The direct transfer of a set of identifiers by each participating party to the orchestrator party and the subsequent union of the received sets into one set is undesirable due to the fact that the orchestrator party is able to store information about which particular source party the specific transmitted set of identifiers belongs to. Instead, it is proposed to perform a process known as secure set pooling, which is implemented using MPC technology and consists of the source parties (1) of the data following the interactive MPC protocol and consistently build a pooling of their ID sets, which is formed as a secret division, the shares of which are divided among all the source parties involved in the process.

When the process of sets combination is completed, the source parties transfer their secret shares of the resulting union to the orchestrator party (2), which completes the calculations, combines the shares and restores the secret. Thus, the orchestrator party constructs a complete union of the identifiers of all the source parties, losing the connection with the specific source party. In other words, the orchestrator party receives the identifiers of all the source parties in the aggregate, but has no information that a particular identifier belongs to a set of a particular party or participant.

Figure 2 shows an example of a general circuit of a computer device.

The parties jointly execute a computational algorithm (for example, a machine learning model), the result of which is formed in the form of fractions of a secret shares between all the source parties involved in the process. After the algorithm is completed, the secret shares are transferred to an independent party - the operator, which has no information about the source parties, nor about the data they operate on, nor about the essence of the computational algorithm being executed. The operator side recovers the result of calculations from the secret shares.

The main purpose of orchestration is to provide synchronous secure access to the data identified by a single identifier and which is stored by different source parties, and to identify the result of calculations restored by the operator party.

To perform this, the orchestrator side starts a sequential search of the identifiers obtained at the stage of safety combining the sets of identifiers of all the source parties.

For each identifier, the orchestrator side performs the following algorithm:

1. The current identifier (hereinafter referred to as - id1) is sent to all source parties involved in the calculations, as well as to the operator party. It should be noted that the orchestrator party does not have information about whether a particular source party has data identified by id1.

2. Each source party receiving id1 has all the information about its own set of IDs, and is able to check whether it includes id1. If so, the ID is marked with the number 1 (one) or 0 (zero) otherwise. In other words, the flag that determines the presence or absence of this object on the source side becomes the part of the feature description of this object. 3. The source parties proceed to execute the computational algorithm (in the future we will refer to it as a model). Let us establish that the result of the model calculation can be 1 (one) or 0 (zero), the analytical value of which is the fact that the object identified by idl belongs (or does not belong) to a target group (hereinafter referred to as - a segment 42). To calculate the model, each source side performs the following: a. If the source party contains data about an object identified by id1, it divides it as a secret into a number of shares equal to the number of source parties participating in the process together with it, and distributes the shares of the secret between them. The flag indicating the presence of data about an object identified as id1 on this side - that is, 1 (unit) - is also shared as a secret, the shares of which are distributed among the source parties involved in the process. b. If the source party doesn’t contain any data about the object identified by id1, it generates the random noise which dimension is equal to the dimension of the data of this source party, divides it as a secret by the number of shares equal to the number of source parties participating in the process together with it, and distributes the shares of the secret between them. The flag that indicates the absence of the data about an object is identified as id1 for this party - that is, 0 (zero) - is also shared as a secret, which shares are distributed among the source parties involved in the process. c. As a result of steps a and b is formed an exhaustive feature description of the object, shared as a secret between all the source parties involved in the process. Parties that have data about an object identified as id1 enter the real data about this object and mark it with the flag 1. Parties that do not have data about an object identified as id1 enter artificial (random noise) data about this object and mark it with the flag 0. Flags are also shared as a secret, so no party can find out whether real or artificial data about this object was entered by other source parties. d. The source parties follow the MPC protocol and calculate the model, passing to its inputs the secrets shares of the object feature description (with the exception of flags). The result of calculating the model is a result that is divided as a share of secrets between all the source parties involved in the process. This result may have an analytical value if all participating parties have contributed real data about an object identified by id1, or it may be distorted if one (or more) source parties have contributed artificial data (random noise) about this object. The result of the model calculation, as mentioned above, is 1 (one) or 0 (zero), depending on whether the object belongs to the target group segment 42. e. The source parties perform a sequential MPC multiplication of the model calculation result obtained in step d and all the flags that determine whether there is or there is no information about the object identified by id1 for each specific source party. f. The source parties pass the secret result fractions of the performed MPC multiplication to the operator side. g. The operator side restores the secret result.

As a result of the algorithm calculation, the operator side receives a result for the object identified by id1, which is 1 (one) or 0 (zero), the analytical value of which, as noted above, is the fact that the object belongs (or doesn’t belong) to the target group segment 42. At the same time, in the case of 0 (zero), the operator side has no way to determine the reason why the object identified by id1 doesn’t belong to the target group: because the object does not really belong to the target group, or because there is not enough information to make a decision, because one (or more) of the source parties do not own information about the object identified by id1.

In these application materials, the preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, private embodiments of its implementation, which do not go beyond the requested scope of legal protection and are obvious to specialists in the relevant field of technology.

In the present application materials, the preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, particular embodiments of its implementation, which do not go beyond the scope of the claimed scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

1. Method of piece data synchronization describing a single entity and stored in different databases that contain stages: a) the current ID of the object is distributed to all source parties involved in the computation, and to the side-operator, in this case, a side-orchestrator has no information about whether a particular side of the source data identified by the current object ID; b) each source party receiving the current identifier has all the information about its own set of identifiers, and can check whether it includes the current object identifier.; c) the source parties follow the MPC protocol and perform a computational algorithm, while the result of the calculation is the fact establishment of belonging or non-belonging of the object identified by the current identifier to any target group; d) the result is formed in the form of secret shares, divided among all the source parties involved in the process; e) after the completion of the algorithm, the secret shares in the result are transferred to an independent operator party that restores the results of the calculations from the secret shares, while the operator party does not have information about the source parties, nor about the data they operate with, nor about the essence of the computational algorithm being performed.

2. Method of piece data synchronization according to claim 1, characterized by the fact that to perform the computational algorithm, each side of the source performs the following: a) if a source party contains data about the object identified by the current object ID, it adds to them a sign with the value 1, and then share them as a secret for a number of shares equal to the number of sides to sources involved in the process together, and distributes shares of the secret between them; b) if the source party does not contain data about the object identified by the current object identifier, it generates random noise, the dimension of which is equal to the dimension of the data of this source party, adds a feature with the value 0 to it, and then divides it as a secret by the number of shares equal to the number of source parties participating in the process together, and distributes the shares of the secret between them; c) as a result, is formed an exhaustive feature description of the object, divided as a secret between all the source parties involved in the process; d) the source parties follow the MPC protocol and perform a computational algorithm, using as input the shares of secrets of the feature description of objects, with the exception of the shares of special features; e) as a result of the execution of the computational algorithm, the result is obtained, which is divided as a secret share between all the source parties involved in the process; f) the source parties perform a sequential MPC multiplication of the obtained result of the computational algorithm execution by the values of special features, followed by the transfer of the result secret share of the performed MPC multiplication to the operator party, followed by the restoration of the secret of the result.