WO2021118413A2

WO2021118413A2 - Data processing method, comprising secure multilateral computing and data analysis methods

Info

Publication number: WO2021118413A2
Application number: PCT/RU2020/050335
Authority: WO
Inventors: Vitaly SATTAROV; Peter EMELYANOV; Alexey VORONIN
Original assignee: Ubic Technologies Llc
Priority date: 2019-12-13
Filing date: 2020-11-19
Publication date: 2021-06-17
Also published as: RU2722538C1; WO2021118413A3

Abstract

The present invention relates to the field of computing, in particular, to a method for processing data using secure multilateral computing and data analysis methods. Data processing method, comprising secure multilateral computing and data analysis methods is containing of the stages: a) one or more systems participating in secure multilateral computing using a common computational algorithm synchronize object identifiers; b) one or more systems participating in secure multilateral computing synchronize the nomenclature of object attributes; c) one or more systems participating in secure multilateral computing use a common computational algorithm that performs operations on object attributes, the result of which is a list of target identifiers; e) the list of target identifiers is passed to one of the systems participating in secure multilateral computing; g) the identifier of the received list of target object identifiers is transferred to the customer system.

Description

Data processing method, comprising secure multilateral computing and data analysis methods

The present invention relates to the field of computing, in particular, to a method for processing data using secure multilateral computing and data analysis methods. From the prior art is known a technical solution that describes the usage of a confidential computing protocol, which allows several participants on the same platform during the interaction process that depend on the confidential input data of each of them, so that no one can obtain any information about others secret input data (US 20170048208 Al, publication date 16.02.2017). However, the above mentioned solution does not exclude the object identifiers direct exchange between participating systems in joint computing, which does not allow to exclude the disclosure of the calculation results by the participating systems in terms of the properties and characteristics of the obtained result. In addition, from the prior art is also known a training machine model learning system on data located on different computers using a general computational scheme without the exchange of personal data. The described system uses the Secure Multiparty Computation (MPC) protocol (WO 2018174873 Al, publication date 27.09.2018). However, the above mentioned solution does not ensure the safety of the calculation result transmitted to the operator system in terms of the usage of the obtained result by the operator system for purposes that do not correspond to the task set for the operator system. The mentioned solution also lacks the ability to synchronize IDs and object nomenclature.

Also, from the prior art are known a distributed system for executing machine learning models and a training machine learning models method. The system contains many computing devices, and each computing device can record, save data and perform operations on a stream of such data, while calculations on the data stream are described by one or more directed acyclic graphs (international application WO 2019042200 Al, publication date 22.08.2018).

However, the above solution does not use MPC protocols, but is based on data separation by training the model on different devices used in the calculations, while each device gains access to its own set of initial data. The method implies the usage of only one type of models, namely, those based on directed acyclic graphs, which does not allow using other types of algorithms implementation. The above solution also does not allow to ensure the safety of the calculation result transmitted to the operator system in terms of using the result obtained by the operator system for purposes that do not correspond to the task set for the operator system.

The closest prior art to the claimed technical solution is a method of confidential calculation of the number of tokens from a provided set of tokens into one or more sets of records containing sets of tokens (US 20160019394 Al, publication date 01.21.2016). The above mentioned solution can be used for a creation of a secure recommendation system, where the original data (datasets, containing tokens) are not transmitted to any of the parties of the calculations, maintaining the data confidentiality.

At the same time, there is no a single technical solution, which discloses all the features of the claimed technical solution, since none of the above technical solutions uses the method of separating knowledge about the purpose of the calculation result, the identifier of the calculation results and the content of the calculation results, while maintaining confidentiality the source data of the systems participating in the exchange, which are data providers, using secure multilateral computing methods. The technical problem solved by the claimed technical solution is a creation of a computer-implemented method for processing information about objects using secure multilateral computing and data analysis methods (described in the independent claim).

The technical result achieved from the present invention is the provision of the secure processing of the information about objects, by using secure multilateral computing and data analysis methods using end-to-end identification of objects and using an operator system to exclude the disclosure of the results of calculations by participating systems in terms of the properties and characteristics of the obtained calculation result. This allows several participating systems to perform joint computations, depending on the input data of each of them, so that not any participant can get any information about other's input data and the properties and characteristics of the resulting set of object identifiers and, at the same time, ensure the safety of the computation result transferred to the operator system in terms of the use of the obtained result by the operator system for purposes that do not correspond to the task set for the operator system within the framework of the general computational problem.

In a preferred embodiment of the technical solution, a data processing method, comprising secure multilateral computing and data analysis methods is containing of the stages: a) one or more systems participating in secure multilateral computing using a common computational algorithm synchronize object identifiers; b) one or more systems participating in secure multilateral computing synchronize the nomenclature of object attributes; c) one or more systems participating in secure multilateral computing use a common computational algorithm that performs operations on object attributes, the result of which is a list of target identifiers; e) the list of target identifiers is passed to one of the systems participating in secure multilateral computing, which is the operating system and can operate on one or more objects from the list of target identifiers. g) the identifier of the received list of target object identifiers is transferred to the customer system in such a way that the customer system gets the opportunity to perform actions with the objects, the list of which is in the operating system through the identifier of this list, while the customer system does not have information about the content of this list.

Objects can be any subjects of the material world, any living beings, more specifically - people, as well as identifiers and groups of objects.

Identifiers can be a sequence of characters, bits, graphics, audio information, and biometric data.

Participating systems synchronize object identifiers so that all participating systems take part in secure multilateral computing identify data related to the same object, regardless in which participating systems store information about this object.

Participating systems synchronize the parameters of the attribute nomenclature by taking part of one of the participating systems or an external system as the organizer of the attribute nomenclature synchronization process.

Participating systems which are taking part in joint computing use a common nomenclature of object attributes in such a way that each participating system is taking part in the calculations and contains a single directory of object attribute names for all participating systems.

To execute computational algorithms in multilateral computing, the participating systems use a computer program that executes the computational algorithm code.

To perform computational algorithms within the computational circuit of the participating systems, the participating systems are transferred a computer program for auditing and controlled execution of the computer program by the participating system within its information space.

As computational algorithms in multilateral computing, the participating systems use various techniques for working with data in digital form, including machine learning models.

Participating systems use a common computational algorithms directory.

Based on the training samples provided by the participating systems, computational algorithms are created for usage in multilateral computing.

Participating systems pass the attributes of selected objects to the participating system, which is the organizing system for creating computational algorithms.

Based on the training samples provided by the participating systems are created computational algorithms that use the methods of mathematical statistics and probability theory, numerical methods, optimization methods, and others used in the development of computational algorithms for usage in multilateral computing.

Further, the invention will be described in accordance with the drawings, which explain the invention. The following drawings are attached to the application: FIG. 1 A method for processing information about objects (also, for the communication tasks);

FIG. 2 An example of a general circuit of a computing device;

FIG. 3 A method of processing information about objects - example of its implementation for communication purposes ;

FIG. 4 A method of processing information about objects - example of the customs service;

FIG. 5. A method of processing information about objects - example of operations with a computer or software. In the following description of the invention, numerous details are mentioned to provide a good understanding of the present invention. However, it will be obvious to person skilled in the art how the present invention can be used with or without these details. In other instances, well-known techniques, procedures, and components have not been detailed described so as not to obstruct the details of the present invention.

In addition, it will be clear from the above description that the invention is not limited with the implementation. Numerous possible modifications, changes, variations and substitutions, will be apparent to person skilled in the art. The present invention describes a data processing method, comprising secure multilateral computing and data analysis methods.

The usage of secure multilateral computing techniques refers to secure multiparty computaion, in which data requests are computed in a distributed manner, without a trusted third party. At the same time, the data is divided between different nodes, and they calculate functions together, without passing information to other nodes.

Machine learning - this is a domain of artificial intelligence associated with the development and construction of analytical models. To create such methods are used the tools of mathematical statistics, numerical methods, optimization methods, probability theory, graph theory, various techniques for working with data in digital form.

The claimed solution uses a new approach to data sharing - it is an environment for joint solution by project participants (participating systems) of a limited range of tasks without sharing their initial data.

As illustrated in FIG. 1, the claimed data processing method, comprising secure multilateral computing and data analysis methods (100) can be implemented as follows: 1. The organizer system (103) synchronizes the identifiers with each of the participating systems (101, 102) and creates a general table of identifiers correspondence (105). The correspondence table stores the object identifier in the organizing system and the corresponding identifiers in the participating systems and at least one operator system. 1.1. The number of participating systems is not limited. In this case, at least one of the participating systems is taking part in the process of identifying synchronization must be an operator system (105) that performs operations on a set of objects using information about performed calculations.

2. The customer system (106) selects in the organizer system (103) a computational algorithm (107) that will be used in secure multilateral computing. The algorithm uses data from one or more participating systems. The computation of the result is based on one of the secure multilateral computing (111) and is controlled by the organizer system (103). Calculations are performed simultaneously in the internal information circuits of all participating systems, which data is used to calculate the selected computational algorithm.

3. The organizer system (103) transmits to all participating systems taking part in the computations their own identifiers corresponding to each object identifier in the organizing system. Participating systems carry out secure multilateral computations for the transmitted identifiers and return the results of their own computations (parts of the multilateral computation results) to the recovery module (108). The results recovery module (108) can be located in the computational circuit of any of the participants in the calculations.

4. As a result of calculations in the operator system, a set of target mobile systems (109) is created, the information on the purpose and characteristics of the system which the operator (104) does not possess, thus ensuring the availability of a set of target numbers. 5. The organizer system (103) transmits to the Customer System (106) the identifier of the set of target identifiers (110), which the customer system can use to place the target information in information systems according to the given list of target identifiers.

A fundamental feature that allows the implementation of a secure data sharing scheme is that the described method is aimed to solve the final problems of the participating systems, not at ensuring data sharing.

This ensures that there are no risks of unauthorized access to both the initial data of participants and the results of calculations.

FIG. 2 shows an example of a computing device (500), which is used to implement the claimed solution. The device (500) can be selected from a wide range of known devices that provide the required functionality, for example, a computer, laptop, server, tablet, smartphone, portable game console, mainframe, supercomputer, etc. The device (500) contains one or more processors (501) united with at least one memory (502), data storage (503), input/output interfaces (504), input / output devices (505) , networking tools (506).

The processor (501) (or multiple processors, multi-core processor) can be selected from a wide range of devices currently widespreaded, such as Intel ™, AMD ™, Apple ™, Samsung Exynos ™, MediaTEK ™, Qualcomm Snapdragon ™, etc. The processor (501) of the device (500) performs the basic computational operations necessary for the operation system of the device (500) or the functionality of one or more of its components. The processor (501) executes the necessary machine-readable instructions contained in the random access memory (RAM) (502).

Memory (502), as a rule, is made in the form of RAM and contains the necessary program logic to provide the required functionality. RAM is a random access memory and is intended for storing machine -readable instructions executed by the processor (501) for performing the necessary operations for logical data processing. RAM usually contains executable instructions of the operating system and related software components (applications, software modules, etc.).

The data storage medium (503) can be in the form of HDD, SSD disks, raid network, network storage, flash memory, optical storage devices (CD, DVD, MD, Blue-Ray disks), etc. The data storage medium (503) allows to perform a long-term storage of various types of information, for example, the files with user data sets, databases containing records of time intervals measured for each user, user IDs, etc.

Various types of input / output interfaces (504) are used to organize the operation of device components (500) and to organize the operation of external connected devices. The choice of the interfaces depends on the specific design of the computing device, which may be, but is not limited to: PCI, AGP, PS / 2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

To ensure the interaction between the user and the computing device (500), various means (505) of input/output information are used, for example, a keyboard, display (monitor), touch display, touch-pad, joystick, mouse manipulator, light pen, stylus, touch panel, trackball, speakers, microphone, augmented reality, optical sensors, tablet, light indicators, projector, camera, biometric identification (retina scanner, fingerprint scanner, voice recognition module), etc.

Networking (506) provides data transmission by the device (500) via an internal or external computer network, for example, Intranet, Internet, LAN, etc. One or more means (506) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and / or BLE module, Wi-Fi module and etc. Additionally, satellite navigation means can be used, for example, GPS, GLONASS, BeiDou, Galileo.

FIG. 3 shows a method for processing information about objects on the example of its implementation for communication purposes:

(1) If the Advertiser contacts the DataHub for organization of an advertising company, the advertiser will be provided with the following option in the algorithm catalog (in the model catalog): "communication (advertising) segment of the audience, in which there will be the result of calculations based on combined data from different sources". More specifically, this model will contain information that for its operation, for example, it needs bank data on income and aggregated data from an Internet provider about a person's behavior in the Internet. That is, based on the combined knowledge, you can really build a segment with the results of a targeted query and conduct a highly targeted advertising campaign for the sale of goods.

(2) The DataHub, after receiving the order for creating a segment, transmits the corresponding computational algorithm to the bank, to the Internet provider and to the communication platform and manages confidential co- computing based on this algorithm.

(3) The result of calculations - the target segment requested by the advertiser - is collected only within the communication platform - no one except the communication platform knows what personal identifiers this segment consists of. Thus, none of the participants - neither the data providers nor the DataHub itself, can use the content of this segment and independently create a segment in any communication platform for independent reuse or use the content of this segment in another unintended way. And the initial data on the composite data of the target segment remain with the Bank and the Internet provider, respectively; this data also remains inaccessible to any of the participants in the declared scheme. In addition, it is possible to implement the option when the communication platform in the process of computing does not know who is participating in these calculations (for this, the DataHub or another system can proxy communications in confidential joint calculations so that the parties cannot identify other participants in the joint calculations).

(4) Based on the results of the calculations, the communication platform transmits the segment identifier (the results of the target request), which can be used to refer to this segment, to the DataHub, and it transfers it to the Advertiser. In addition, it is possible that the segment identifier is transmitted to the Advertiser bypassing the DataHub.

(5) After the Advertiser receives the identifier, the Advertiser can contact the communication platform to carry out targeted advertising or other communications of the segment, the identifier of which was previously obtained. It should be noted that the Advertiser does not gain access to the content of the segment, which is stored in the communication platform.

(6) An additional option is possible when the communication platform does not participate in confidential co-computing on its own, but instead an intermediate participant, for example, an advertising agency replaces it. Then the segment is assembled inside this participant and he transfers its content to the communication platform.

FIG. 4 shows a method for processing information about the objects using the example of the customs service:

(1) The Managing Authority faces a challenge: Customs is challenged by the significant seasonal increase in the number of parcels passing through customs. To improve the efficiency of customs, it is required to classify parcels using data from external data providers. This will allow creating narrow segments of parcels for different scenarios of the customs service and will effectively redistribute the efforts of customs officials.

External data providers are not ready to provide additional information about senders, recipients, parcel routes and other data required for analysis. The reasons can be different, including legal restrictions on the transfer of the required type of data and unwillingness to lose control over the data.

The proposed scheme is the solution to this problem.

In the catalog of models there are several models that can determine with high accuracy the premises that, with an acceptable probability for the governing body, are correctly declared. For the model to work, you will need knowledge about senders or recipients that are stored in the Bank (no problems for individuals, a high level of confidence of the Bank in senders - online stores), as well as data that is stored by an Internet provider (negative behavioral patterns of individuals, traffic sender - online store).

Thus, the model, based on the combined data, will construct a segment of parcels that do not require the attention of the customs service with an acceptable probability.

The governing body selects the model and sends the order to the DataHub.

(2) After receiving the order for creating a segment, Datahab distributes the corresponding algorithm to the Bank, the Internet provider and the customs service, and manages the joint calculation of this model.

(3) The result of the calculations - the target segment with the list of parcels - is collected only within the customs office - no one except the customs office knows what parcel identifiers this segment consists of. Thus, none of the participants - neither the data providers, nor the DataHub itself, can take the contents of this segment and, on the basis of it, find out additional information about the package - the sender, the recipient, the route - there are services that provide some additional information. Information about the parcel only by its number, which is stored in the list of parcel identifiers. And the initial data on online behavior and assessments of individuals and legal entities remain with the Bank and the Internet provider, and these data also remain inaccessible to any of the participants in the scheme.

(4) After receiving the results of the calculations, the customs service transmits the segment identifier, which can be used to refer to this segment, to the DataHub, and he transfers it to the Managing Authority.

It is possible that the segment identifier is transmitted to the Customs Authority bypassing the DataHub. (5) The Managing Authority receives the identifier and is able now, depending on the circumstances, to indicate to the Customs Service on the mode with this parcel segment. At the same time, it is important that the Managing Authority does not get access to the contents of the segment (to the list of parcels), the segment is stored in the Customs Service, information about this segment is not transmitted to the Managing Authority. And the Customs Service does not know the characteristics, does not have information about its purpose.

FIG. 5 shows a computer-implemented method for processing information about an object using the example of operations with a computer software (software):

(1) Based on the data that the Internet provider, the Bank and other possible information providers have, it is possible to gain additional knowledge about the user's computer, which, in turn, can make the computer or software work more efficient and secure. This new knowledge can be used by software developers or other actors who can change the settings or behavior of software or computers as a whole. The problem is that external data providers are not ready to provide information about the computer or software that runs on this computer, which is necessary for analysis and decision-making in the field of operation of a particular computer or software running on it. The reasons can be different, including legal restrictions on the transfer of the required type of data and unwillingness to lose control over the data. Moreover, users do not want the information about their computers or software to come from the developers of this software, because these companies can use this knowledge for other than its intended purpose. The solution is the proposed scheme of work.

There are several models in the model catalog that can accurately determine the identifiers of computers or software that, with an acceptable probability, require increased control measures for network interaction. The knowledge about computer users is required for the model work, which is stored in the Bank (an increase in the number of canceled transactions, authorization errors in banking software, and so on) and are stored by the Internet provider (negative behavioral patterns of individuals, suspicious network behavior of software, and so on).

Thus, the model, based on the combined data, will build a list of computers or software requiring additional attention to security, changing settings, or other operations that can make the software or computer more efficient and secure.

The Security Center, which is independent of software developers or operators, chooses a model and sends the order to the DataHub.

(2) After receiving an order for creating a segment, Datahab distributes the corresponding algorithm to the Bank, the Internet provider and the software owner and manages the joint calculation of this model.

(3) The result of calculations - the target segment with a list of computers or software identifiers - is collected only within the contour of the software owner - no one except the software owner knows what computer or software identifiers this segment consists of. Thus, none of the participants - neither the data providers, nor the DataHub itself, can take the content of this segment and, on the basis of it, find out additional information about the computer or software or influence the computer or software based on knowledge of the purpose and content of the segment. And the initial data on the behavior in the network and the Bank's assessments remain with the Bank and the Internet provider, these data also remain inaccessible to any of the participants in the scheme. (4) Based on the results, the owner of the software transmits the segment identifier by which it is possible to refer to this segment to the DataHub, and he transfers it to the Security Center.

A variant is possible when the segment identifier is transmitted to the Security Center bypassing the DataHub.

(5) The Security Center receives an identifier and can now, depending on the circumstances, indicate to the owner of the software on the mode of operation with this segment. It is important that the Security Center does not get access to the contents of the segment (to the list of computer or software identifiers), the segment is kept by the software owner, information about identifiers is not transferred to the Security Center. And the owner of the software does not know the characteristics of the segment, does not have information about its purpose.

In the present application materials, the preferred disclosure of the implementation of the claimed technical solution was presented, which should not be used as limiting other, particular embodiments of its implementation, which do not go beyond the scope of the claimed scope of legal protection and are obvious to specialists in the relevant field of technology.

Claims

1. Data processing method, comprising secure multilateral computing and data analysis methods is containing of the stages: a) one or more systems participating in secure multilateral computing using a common computational algorithm synchronize object identifiers; b) one or more systems participating in secure multilateral computing synchronize the nomenclature of object attributes; c) one or more systems participating in secure multilateral computing use a common computational algorithm that performs operations on object attributes, the result of which is a list of target identifiers; e) the list of target identifiers is passed to one of the systems participating in secure multilateral computing, which is the operating system and can operate on one or more objects from the list of target identifiers. g) the identifier of the received list of target object identifiers is transferred to the customer system in such a way that the customer system gets the opportunity to perform actions with the objects, the list of which is in the operating system through the identifier of this list, while the customer system does not have information about the content of this list.

2. Data processing method according to claim 1, characterized by the fact that the objects can be identifiers.

3. Data processing method according to claim 1, characterized by the fact that objects can be groups of objects.

4. Data processing method according to claim 1, characterized by the fact that identifiers can be a sequence of characters.

5. Data processing method according to claim 1, characterized by the fact that identifiers can be a sequence of bits.

6. Data processing method according to claim 1, characterized by the fact that identifiers can be graphical images.

7. Data processing method according to claim 1, characterized by the fact that identifiers can be biometric data.

8. Data processing method according to claim 1, characterized by the fact that identifiers can be audio information.

9. Data processing method according to claim 1, characterized by the fact that participating systems synchronize object identifiers so that all participating systems participating in joint computing identify data related to the same object, regardless of which participating systems store information about this object.

10. Data processing method according to claim 1, characterized by the fact that participating systems synchronize object identifiers with the participation of one of the participating systems as the organizing system of the identifier synchronization process, and the organizing system stores a table of correspondences of object identifiers of the participating systems, receives information about object identifiers from the participating systems and sends them object identifiers, received from participating systems, including from the organizing system.

11. Data processing method according to claim 1, characterized by the fact that participating systems synchronize the parameters of the nomenclature of attributes with the participation of one of the participating systems as the organizing system of the identifier synchronization process, and the organizing system stores the table of correspondences of object identifiers of the participating systems, receives information about the object identifiers from the participating systems and sends them object identifiers received from participating systems, including from the organizing system.

12. Data processing method according to claim 1, characterized by the fact that participating systems participating in joint computing use a common nomenclature of object attributes in such a way that each participating system participating in the calculations contains a single catalog of object attribute names for all participating systems.

13. Data processing method according to claim 1, characterized by the fact that computer programs are used as a computational algorithm in joint computing.

14. Data processing method according to claim 1, characterized by the fact that to execute computational algorithms in joint computing, the participating systems use a computer program that executes the computational algorithm code.

15. Data processing method according to claim 1, characterized by the fact that for the execution of computational algorithms within the computational circuit of the participating systems, the source code of the computer program is transferred to the participating systems for auditing and controlled execution of the computer program by the participating system within its information space.

16. Data processing method according to claim 1, characterized by the fact that to execute computational algorithms within the computational circuit of the participating systems, the participating systems are transferred a computer program for auditing and controlled execution of the computer program by the participating system within its information space.

17. Data processing method according to claim 1, characterized by the fact that participating systems use a common catalog of computational algorithms.

18. Data processing method according to claim 1, characterized by the fact that on the basis of training samples provided by the participating systems, computational algorithms are created for usage in joint computing.

19. Data processing method according to claim 1, characterized by the fact that participating systems transfer the attributes of selected objects to the participating system, which is the organizing system for creating computational algorithms.

20. Data processing method according to claim 1, characterized by the fact that communications in confidential co-computing between the participating systems can be carried out not directly between the participants, but through another system, while such a system does not gain access to the contents of the communications, since the participating systems ensure the confidentiality of their own communications, and the identification of the participating systems with each other is impossible due to the use of an intermediate system that hides the identification information of systems from each other.