WO2021184347A1 - 实现隐私保护的数据处理方法和装置 - Google Patents
实现隐私保护的数据处理方法和装置 Download PDFInfo
- Publication number
- WO2021184347A1 WO2021184347A1 PCT/CN2020/080392 CN2020080392W WO2021184347A1 WO 2021184347 A1 WO2021184347 A1 WO 2021184347A1 CN 2020080392 W CN2020080392 W CN 2020080392W WO 2021184347 A1 WO2021184347 A1 WO 2021184347A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- operator
- plaintext
- machine learning
- learning model
- data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- This application relates to the field of data security technology, and in particular to a data processing method and device for realizing privacy protection.
- the embodiments of this specification provide a data processing method and device for realizing privacy protection, so as to solve the problem of poor usability of the privacy and confidentiality machine learning framework in the prior art.
- the embodiment of this specification provides a data processing method for realizing privacy protection, which is executed by multiple data holders, and each of the multiple data holders stores its own private sample data.
- the method includes: Replace the plaintext operator in the preset plaintext machine learning model with the cryptographic operator corresponding to the plaintext operator to obtain the privacy machine learning model corresponding to the preset plaintext machine learning model; perform secure multiparty calculations based on the data holders
- the stored private sample data is jointly trained on the private machine learning model, and the target machine learning model is output.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: obtaining a plurality of cryptographic operators, and comparing each cryptographic operator in the plurality of cryptographic operators Register; obtain the optimizer program, and register the optimizer program, where the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
- the optimizer program is also used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, when obtaining multiple cryptographic operators, After registering each codon operator in a codon operator, it also includes: obtaining the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and registering the codon operator gradient function corresponding to each codon operator ; Associate the registered cryptographic operator gradient function with the corresponding registered cryptographic operator.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: obtaining an installation package file, where the installation package file is constructed based on a binary file, and the binary file It is compiled from the source code used to implement multiple cryptographic operators, the source code used to implement the optimizer program, the source code used to register multiple code operators, and the source code used to register the optimizer program.
- the optimizer program is executed when the Used to replace the plaintext operator in the plaintext model with the cryptographic operator corresponding to the plaintext operator; install the installation package file; import the installed file into the preset plaintext machine learning model to add the preset plaintext machine learning model Replace the plaintext operator of with the corresponding cryptographic operator.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: determining the preset based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model Set the plaintext operator that needs to be replaced in the plaintext machine learning model; replace the plaintext operator that needs to be replaced in the preset plaintext machine learning model with the corresponding cryptographic operator.
- the optimizer program includes at least one of the following: a static optimizer program and a dynamic optimizer program.
- the cryptographic operator includes at least one of the following: a secure multi-party computing operator, a homomorphic encryption operator, and a zero-knowledge proof operator.
- the embodiment of this specification also provides a data processing device that realizes privacy protection, which is located in each of the multiple data holders, and each of the multiple data holders stores its own privacy.
- the device includes: a replacement module for replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a private machine learning model corresponding to the preset plaintext machine learning model; training The module is used to perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
- the embodiments of this specification also provide a computer device, including a processor and a memory for storing executable instructions of the processor.
- the processor executes the instructions to implement the privacy protection data processing described in any of the foregoing embodiments. Method steps.
- the embodiments of this specification also provide a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed, the steps of the data processing method for realizing privacy protection described in any of the foregoing embodiments are implemented.
- a data processing method for realizing privacy protection is provided, which is executed by multiple data holders.
- Each data holder stores its own private sample data, and each data holder saves data by pre-
- the plaintext operator in the plaintext machine learning model is replaced with the corresponding cryptographic operator to obtain the private machine learning model.
- multiple data holders perform secure multi-party calculations to perform the private machine learning model based on their private sample data. Joint training, output the target machine learning model.
- the private machine learning model is jointly trained based on the respective private sample data to obtain the target machine learning model, which can effectively protect the privacy of the private sample data of each data holder.
- the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
- FIG. 1 shows a schematic diagram of an application scenario of a data processing method for realizing privacy protection in an embodiment of this specification
- Figure 2 shows a flow chart of a data processing method for implementing privacy protection in an embodiment of this specification
- Figure 3 shows a block diagram of a data processing method for implementing privacy protection in an embodiment of this specification
- FIG. 4 shows a schematic diagram of a data processing device for implementing privacy protection in an embodiment of this specification
- Fig. 5 shows a schematic diagram of a computer device in an embodiment of this specification.
- a plaintext machine learning model can be written in a machine learning framework.
- the plaintext machine learning model may include local plaintext operators provided by the machine learning framework.
- the local plaintext operator in the plaintext machine learning model can be replaced with the corresponding cryptographic operator to obtain the corresponding private machine learning model. This replacement is transparent to the user.
- the private machine learning model can be trained based on the private sample data to obtain the target machine learning model.
- FIG. 1 shows a schematic diagram of the above-mentioned scenario example.
- three data holders are exemplarily shown: data holder 1, data holder 2 and data holder 3.
- the data holder 1 stores the first privacy sample data
- the data holder 2 stores the second privacy sample data
- the data holder 3 stores the third privacy sample data.
- Data holder 1, data holder 2 and data holder 3 can obtain the preset plaintext machine learning model to be trained, and replace the plaintext operator in the preset plaintext machine learning model with a cryptographic operator to obtain the preset Let the plaintext machine learning model correspond to the privacy machine learning model.
- the data holder 1, the data holder 2 and the data holder 3 can perform secure multi-party calculations to jointly train the private machine learning model based on their respective private sample data to obtain the target machine learning model. Joint training of privacy machine learning models through secure multi-party computing can protect the privacy sample data of each data holder from being leaked, effectively protecting data privacy.
- the preset plaintext machine learning model may be implemented based on the plaintext machine learning framework.
- the plaintext machine learning framework can be any existing plaintext machine learning framework, for example, TensorFlow, Pytorch, MxNet, CNTK-Azure and other frameworks. Therefore, this specification does not limit the specific plaintext machine learning framework used to generate the plaintext machine learning model, and it can be selected according to actual needs.
- the cryptographic operator can be any cryptographic operator that can provide privacy protection for the input data of all parties in a scenario where two or more data holders jointly (or collaborate) to perform machine learning training and prediction.
- the cryptographic operator may be a Secure Multi-Party Computation (MPC) operator, a homomorphic encryption (Homomorphic Encryption, HE) operator, or a zero-knowledge proof (Zero-knowledge) operator. Proof, ZKP) operator, etc.
- MPC Secure Multi-Party Computation
- HE homomorphic Encryption
- ZKP zero-knowledge proof
- this specification does not limit the specific codon operators used, which can be selected according to actual needs.
- developers can write the source code for implementing the cryptographic operator corresponding to the local plaintext operator in the machine learning framework and the source code for registering the cryptographic operator in the machine learning framework, and write The source code of the gradient function that implements the cryptographic operator and the source code used to register the gradient function of the cryptographic operator in the machine learning framework, the source code of the optimizer program is written, and the source code of the optimizer program is registered in the machine learning framework.
- the optimizer program is used to replace the plaintext operator in the plaintext machine model with the corresponding codon operator.
- developers can add the source code of the implementation of the codon operator, the source code of the registered codon operator, the source code of the code operator gradient function, the source code of the registered code operator gradient function, the source code of the optimizer program and the registered optimizer program
- the source code is compiled into a binary file.
- a python installation package is generated based on the binary file (for example, the file name is Rosetta.whl).
- the data holder 1, the data holder 2 and the data holder 3 can download the installation package and install it. Users of data holder 1, data holder 2 and data holder 3 only need to add a line of code "import Rosetta" under the plaintext machine learning model written by them to support the privacy protection function.
- secure multi-party calculations can be performed to jointly train the model based on private sample data to obtain a trained target machine learning model. After the target machine learning model is obtained, the target machine learning model can be used to make predictions.
- the target machine learning model obtained by each data holder through joint training may be a complete machine learning model, that is, the model parameters are complete.
- each data holder can obtain the data to be predicted, and input the data to be predicted into the trained complete machine learning model to obtain the prediction result.
- the data holder 1 can obtain the data to be predicted, and the prediction result can be obtained by inputting the data to be predicted into the complete machine learning model.
- the target machine learning model obtained by each data holder through joint training may be an incomplete machine learning model, that is, the model parameters are incomplete.
- one of the multiple data holders can obtain the data to be predicted, and the data holder and other data holders perform multi-party security calculations to obtain the joint prediction result of the data to be predicted .
- the data holder 1 can obtain the data to be predicted.
- the data holder 1 and the data holder 2 and the data holder 3 perform secure multi-party calculations to perform joint prediction based on the data to be predicted and the target machine learning model to obtain a joint prediction result.
- the embodiment of this specification provides a data processing method for realizing privacy protection, which is executed by multiple data holders, and each of the multiple data holders stores its own private sample data.
- Fig. 2 shows a schematic diagram of an application scenario of a data processing method for realizing privacy protection in an embodiment of this specification.
- this specification provides method operation steps or device structures as shown in the following embodiments or drawings, more or less operation steps or module units may be included in the method or device based on conventional or no creative labor. .
- the execution order of these steps or the module structure of the device is not limited to the execution order or module structure shown in the description of the embodiments of this specification and the drawings.
- the described method or module structure is applied to an actual device or terminal product, it can be executed sequentially or in parallel according to the method or module structure connection shown in the embodiments or drawings (for example, parallel processors or multi-threaded processing Environment, even distributed processing environment).
- the data processing method for realizing privacy protection may include the following steps.
- Step S201 replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a privacy machine learning model corresponding to the preset plaintext machine learning model.
- each of the multiple data holders may store a preset plaintext machine learning model.
- the preset plaintext machine learning model may be a machine model established by a user in a machine learning framework.
- the machine learning framework can refer to all machine learning systems or methods including machine learning algorithms, and can include data representation and processing methods, methods for representing and suggesting predictive models, and methods for evaluating and using modeling results.
- the machine learning framework can include one of the following: TensorFlow, Pytorch, MxNet, CNTK-Azure and other frameworks.
- the machine learning framework can include multiple local plaintext operators.
- the preset plaintext machine learning model established based on the machine learning framework may include local plaintext operators (referred to as plaintext operators) in the machine learning framework.
- Each of the multiple data holders may store their own private sample data.
- each data holder can replace the plaintext operator in the preset plaintext machine learning model with the password operator corresponding to the plaintext operator to obtain the preset plaintext machine learning
- the privacy machine learning model corresponding to the model refers to an operator that implements the corresponding operation of the plaintext operator in the case of encryption.
- Step S202 Perform secure multi-party calculation to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
- each data holder After each data holder obtains the private machine learning model, multiple data holders can perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder.
- the model parameters in the privacy machine learning model can be determined, and the trained target machine learning model can be obtained.
- each data holder After outputting the target machine learning model, each data holder can process the data to be predicted based on the target machine learning model to obtain a prediction result.
- the method in the above embodiment only needs to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator to obtain the corresponding private machine learning model, and then multiple data holders perform secure multiparty calculations ,
- the private machine learning model can be jointly trained based on the respective private sample data to obtain the target machine learning model, which can effectively protect the privacy of the private sample data of each data holder.
- the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: obtaining multiple cryptographic operators, and comparing each of the multiple cryptographic operators.
- the password operator is registered; the optimizer program is obtained, and the optimizer program is registered, where the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding password operator.
- each data holder can obtain multiple cryptographic operators.
- each of the multiple cipher operators can correspond to multiple plaintext operators in the machine learning framework. That is, obtaining multiple cryptographic operators may include cryptographic operators corresponding to the plaintext operators in the preset plaintext machine learning model.
- each data holder can receive multiple cryptographic operators input by the user.
- multiple cryptographic operators may be stored in the server, and each data holder may send an acquisition request to the server, and the server sends the multiple cryptographic operators to each data holder in response to the acquisition request.
- each data holder can register each cryptographic operator into the machine learning framework. For example, each data holder may register each of the multiple cryptographic operators into the machine learning framework based on the operator registration interface provided by the machine learning framework.
- Each data holder can obtain the optimizer program.
- the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model established based on the machine learning framework with the corresponding cryptographic operator.
- each data holder can receive an optimizer program input by the user.
- the optimizer program may be stored in the server, and each data holder may send an acquisition request to the server, and the server sends the optimizer program to each data holder in response to the acquisition request.
- each data holder can register the optimizer program in the machine learning framework. For example, each data holder may register the optimizer program in the machine learning framework based on the optimizer registration interface provided by the machine learning framework.
- the plaintext operators in the preset plaintext machine learning model established in the machine learning framework can be replaced with corresponding passwords Operator to obtain the privacy machine learning model corresponding to the preset plaintext machine learning model.
- the optimizer program may include at least one of the following: a static optimizer program and a dynamic optimizer program.
- the static optimizer program refers to replacing the plaintext operator in the machine learning model with the corresponding cryptographic operator before the data flow graph in the plaintext machine learning model is executed.
- the dynamic optimizer program refers to replacing the plaintext operator in the plaintext machine learning model with the corresponding cryptographic operator during the execution of the data flow graph in the plaintext machine learning model.
- the optimizer program may include a static optimizer program, may also include a dynamic optimizer program, or may include both a dynamic optimizer program and a static optimizer program.
- the optimizer program may include at least one of a static optimizer program and a dynamic optimizer program, which has high flexibility.
- the optimizer program can also be used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, when obtaining multiple cryptographic operators, And after registering each codon operator in the multiple codon operators, it may also include: obtaining the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and verifying the codon operator corresponding to each codon operator The gradient function is registered; the registered codon operator gradient function is associated with the corresponding registered codon operator.
- each data holder can obtain the cryptographic operator gradient function corresponding to each of the multiple cryptographic operators. For example, each data holder may receive the cryptographic operator gradient function corresponding to each cryptographic operator input by the user.
- the cryptographic operator gradient function corresponding to each cryptographic operator can be stored in the server, and each data holder can send an acquisition request to the server, and the server responds to the acquisition request to convert the cryptographic operator gradient function corresponding to each cryptographic operator Send to each data holder.
- each data holder can learn the codon operator gradient function corresponding to each codon operator into the framework. For example, each data holder may register the optimizer program in the machine learning framework based on the gradient function registration interface provided by the machine learning framework. Each data holder can also associate each codon operator with the codon operator gradient function corresponding to each codon operator. By associating the password operator with the corresponding gradient function, the machine learning framework can find the matching password operator gradient function according to the password operator when performing automatic derivation.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: obtaining an installation package file, where the installation package file is constructed based on a binary file , The binary file is compiled from the source code used to implement multiple cryptographic operators, the source code used to implement the optimizer program, the source code used to register multiple cryptographic operators, and the source code used to register the optimizer program.
- the optimizer program During execution, it is used to replace the plaintext operator in the plaintext model with the corresponding cryptographic operator of the plaintext operator; install the installation package file; import the installed file into the preset plaintext machine learning model to replace the preset plaintext machine
- the plaintext operator in the learning model is replaced with the corresponding cryptographic operator.
- the replacement of the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator can be achieved in the form of an installation package.
- the source code for implementing multiple cryptographic operators, the source code for registering multiple cryptographic operators, the source code for implementing the optimizer program, and the source code for registering the optimizer program can be compiled into a binary file.
- an installation package file can be generated based on the compiled binary file.
- Each data holder can obtain the installation package file.
- the installation package file can be stored in the server.
- Each data holder can send a download request to the server, and the server sends an installation package file to each data holder in response to the download request.
- each data holder can install the installation package file to obtain the installed file.
- the installed file may include code for interpretation and execution and executable code.
- each data holder can import the installed file into the plaintext machine learning model, thereby replacing the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
- users of each data holder can add a line of code to the preset plaintext model (for example, import Rosetta, Rosetta is the name of the installation package), and then import the installed files into the preset plaintext machine learning.
- the plaintext operator in the plaintext machine learning model can be replaced with the corresponding codon operator.
- the method in the above embodiment only needs to add a line of code to the preset plaintext machine learning model to obtain the privacy machine learning model corresponding to the plaintext machine learning model.
- the operation is convenient and simple, and there is no need to add other application program interfaces or add others.
- the privacy data type is easy to use.
- the plaintext machine learning model can be transplanted very conveniently through the above method. Compared with other privacy machine learning frameworks, if you want to transplant a plaintext machine learning model, you must use the corresponding privacy data type and privacy application program interface to rewrite, and to use this solution, you only need to add a line of import code to the source code of the plaintext model. Can support privacy protection, so the portability is extremely high. In addition, the model can be easily extended through the above methods.
- replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: data flow based on the data flow graph corresponding to the preset plaintext machine learning model , Determine the plaintext operator that needs to be replaced in the preset plaintext machine learning model; replace the plaintext operator that needs to be replaced in the preset plaintext machine learning model with the corresponding cryptographic operator.
- the plaintext operator to be replaced in the preset plaintext machine learning model may be determined first.
- the plaintext operators that need to be replaced in the preset plaintext machine learning model can be determined based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model.
- the data flow graph can be used to represent the data flow information in the plaintext machine learning model.
- the data flow graph is a tensor flow graph.
- the nodes in the tensorflow graph represent mathematical operations in the graph, and the lines in the graph represent multi-dimensional data arrays that are interconnected between nodes, that is, tensors.
- the plaintext operator in the preset plaintext machine learning model is replaced with the operator corresponding to the plaintext operator, so the privacy can be changed.
- the operator through which the sample data flows is determined as the plaintext operator to be replaced.
- the privacy sample data is used to train the model to obtain model parameters (also referred to as training variables)
- the operator through which the training variable flows can be determined as the plaintext operator to be replaced.
- the plaintext operator that needs to be replaced in the preset plaintext machine learning model can be replaced with the corresponding cryptographic operator.
- the plaintext operator to be replaced can be determined, thereby completing the replacement.
- the cryptographic operator can provide privacy protection for the input data of all parties in the joint (or collaborative) machine learning model training and prediction scenarios of two or more data holders.
- Codon operator may include secure multi-party computation operators (MPC OPs), homomorphic encryption operators (HE OPs), or zero-knowledge proof operators (ZK OPs).
- MPC OPs secure multi-party computation operators
- HE OPs homomorphic encryption operators
- ZK OPs zero-knowledge proof operators
- the secure multi-party calculation operator may include one or more of Garbled Circuit, Oblivious Transfer, and Secret Sharing.
- the foregoing codon operators are only exemplary, and this specification does not limit the specific codon operators to be used, and can be specifically selected according to needs.
- the secure multi-party calculation in step S202 can be executed based on the cryptographic operator replaced in step S201.
- the secure multi-party calculation in step S202 may be a generalized secure multi-party calculation, and may include at least one of MPC, HE, and ZK.
- MPC multi-party calculation
- HE high-power plaintext machine learning model
- ZK ZK calculations when training the model.
- the cryptographic operator can be implemented in C language or C++ language.
- C language or C++ language By adopting C language or C++ language to realize the cryptographic operator, the performance of the machine model can be improved.
- the cryptographic algorithms are all implemented using python.
- the cryptographic operators are all implemented using C/C++, so the performance is high.
- other languages can also be used to implement cryptographic operators, such as Python language.
- the TensorFlow framework is taken as an example for description.
- the data processing method for implementing privacy protection may include the following steps.
- Step 1 The framework developer uses C/C++ to compile the source code of the cryptographic operators (Crypto OPs) corresponding to the native plaintext operators (TF Native OPs) that implement the TensorFlow framework on various devices.
- the TensorFlow framework supports distributed data processing, it can be distributed and executed on a variety of devices.
- various devices may include CPU (Central Processing Unit), GPU (Graphics Processing Unit), or TPU (Tensor Processing Unit), etc. Since different devices have different instruction sets for implementing cryptographic operators, it is necessary to implement cryptographic operators suitable for different devices.
- Step 2 The framework developer uses C/C++ to write the source code for registering the cryptographic operator in the TensorFlow framework.
- Step 3 The framework developer uses C/C++ to write the source code for implementing the cryptographic operator gradient functions (Crypto OP Gradient Functions) corresponding to the cryptographic operator.
- Step 4 The framework developer uses C/C++ to write the source code for registering the cryptographic operator gradient function in the TensorFlow framework and associating the cryptographic operator with the cryptographic operator gradient function. Associating the corresponding cipher operator with the cipher operator gradient function enables the Tensorflow framework to automatically find the cipher operator gradient function matching it according to the cipher operator when performing automatic derivation.
- Step 5 The framework developer uses C/C++ to write an optimizer program for replacing the local plaintext operator in the TensorFlow Graph (tensor flow graph) corresponding to the plaintext machine learning model with the corresponding cryptographic operator.
- the optimizer program may include a static optimizer program (Static Pass) or a dynamic optimizer program (Dynamic Pass) or both.
- the function implemented by the static optimizer program is to replace before TensorFlow Graph is executed.
- the function implemented by the dynamic optimizer program is to replace when TensorFlow Graph is executed. Combining the static optimizer program and the dynamic optimizer program can improve efficiency and flexibility.
- the method in this specific embodiment does not add other application program interfaces, nor does it add other private data types. It only needs to automatically use cryptographic operators at the bottom of the TensorFlow framework to replace the framework's local plaintext calculations. son. Therefore, for users, the use of private machine learning algorithms in this solution is consistent with the use of plaintext machine learning in algorithm coding. Therefore, compared with the solutions implemented by the existing privacy framework, the ease of use of this solution is higher.
- Step 6 The framework developer uses C/C++ to write the source code for registering the static optimizer program and/or the dynamic optimizer program in the TensorFlow framework.
- Step 7 based on the compiled source code for implementing the cryptographic operator, the source code of the registered cryptographic operator, the source code of implementing the cryptographic operator gradient function, the source code of the registered cryptographic operator gradient function, the source code of implementing the optimizer program, and the registered optimizer program
- the source code build binary files, for example, generate Rosetta.so or Rosetta.dll.
- Step 8 Based on the obtained binary file, make a python pip installation package, such as Rosetta.whl.
- Step 9 Send the generated installation package to each data holder among the multiple data holders.
- Step 10 Each data holder installs the obtained installation package, and adds a line of code import Rosetta to the plaintext machine learning model in each data holder, and then the installed file can be imported into the plaintext machine learning model to obtain The corresponding privacy machine learning model.
- Step 11 Multiple data holders execute secure multi-party calculations to jointly train the private machine learning model based on the private sample data stored in each data holder to obtain the target machine learning model.
- the target machine learning model can be used for subsequent prediction of data to be predicted.
- FIG. 3 shows a block diagram of a data processing method for implementing privacy protection provided in an embodiment of this specification.
- the block diagram mainly includes the following modules: Rosetta Static Pass (static optimizer module), Rosetta Dynamic Pass (dynamic optimizer module) and Rosetta Crypto Ops (cryptographic operator module).
- Rosetta Static Pass static optimizer module
- Rosetta Dynamic Pass dynamic optimizer module
- Rosetta Crypto Ops crypto operator module
- Rosetta Static Pass is a static graph optimization module based on the python programming interface. It is responsible for replacing the corresponding TF Native OPs in the TensorFlow graph with the corresponding cryptographic operators (Crypto OPs) before the execution of the TensorFlow graph.
- These Crypto OPs can include encryption operators such as secure multi-party computing operators (MPC Ops), homomorphic encryption operators (HE Ops), and zero-knowledge proof operators (ZK Ops).
- MPC Ops secure multi-party computing operators
- HE Ops homomorphic encryption operators
- ZK Ops zero-knowledge proof operators
- which cryptographic operator to replace depends on the user's configuration, if the user is configured for secure multi-party computing, then replace TF Native OPs with MPC Ops. In the same way, if the user is configured as HE or ZK, HE Ops or ZK Ops will be used to replace TF Native Ops. It can all be implemented in C/C++.
- Rosetta Dynamic Pass is a dynamic graph optimization module based on the C/C++ programming interface. It is responsible for replacing the corresponding TF Native OPs in the TensorFlow graph with the corresponding Crypto OPs when the TensorFlow graph is executed. It can all be implemented in C/C++.
- Rosetta Crypto Ops is a module used to implement various types of Crypto Ops, which can include encryption operators such as MPC Ops, HE Ops, ZK Ops, etc. These Crypto Ops correspond to TF Native Ops and can all be implemented in C/C++.
- the embodiment of this specification also provides a data processing device for realizing privacy protection.
- the device is located in each of the multiple data holders, and each of the multiple data holders is There are respective private sample data stored in Youfang, as described in the following embodiment. Since the problem-solving principle of the data processing device that realizes privacy protection is similar to that of the data processing method that realizes privacy protection, the implementation of the data processing device that realizes privacy protection can refer to the implementation of data processing method that realizes privacy protection, and the repetition will not be repeated here. .
- the term "unit” or “module” can be a combination of software and/or hardware that implements a predetermined function.
- Fig. 4 is a structural block diagram of a data processing device for implementing privacy protection according to an embodiment of this specification. As shown in Fig. 4, it includes: a replacement module 401 and a training module 402. The structure is described below.
- the replacement module 401 is configured to replace the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a privacy machine learning model corresponding to the preset plaintext machine learning model.
- the training module 402 is used to perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
- the replacement module can be used to: obtain multiple codon operators and register each of the multiple codon operators; obtain an optimizer program, and register the optimizer program, where , The optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
- the optimizer program is also used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, the replacement module can also be used to: Multiple codon operators, and after registering each codon operator in the multiple codon operators, obtain the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and compare the corresponding password of each codon operator The operator gradient function is registered; the registered password operator gradient function is associated with the corresponding registered password operator.
- the replacement module can be used to: obtain an installation package file, where the installation package file is constructed based on a binary file, and the binary file is composed of the source code used to implement multiple cryptographic operators and used to implement the optimizer.
- the source code of the program, the source code used to register multiple cryptographic operators, and the source code used to register the optimizer program are compiled.
- the optimizer program is used to replace the plaintext operator in the plaintext model with the corresponding plaintext operator during execution Password operator; install the installation package file; import the installed file into the preset plaintext machine learning model to replace the plaintext operator in the preset plaintext machine learning model with the corresponding password operator.
- the replacement module can be used to: determine the plaintext operators that need to be replaced in the preset plaintext machine learning model based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model; The plaintext operators that need to be replaced in the machine learning model are replaced with the corresponding cryptographic operators.
- the optimizer program includes at least one of the following: a static optimizer program and a dynamic optimizer program.
- the cryptographic operator includes at least one of the following: a secure multi-party computing operator, a homomorphic encryption operator, and a zero-knowledge proof operator.
- the embodiments of this specification achieve the following technical effects: only need to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator to obtain the corresponding private machine learning Model, after multiple data holders perform secure multi-party calculations, they can jointly train the private machine learning model based on their respective private sample data to obtain the target machine learning model, which can effectively protect the private sample data of each data holder Privacy.
- the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
- the embodiment of this specification also provides a computer device.
- the computer device may specifically include an input device. 51.
- the processor 52 and the memory 53.
- the memory 53 is used to store processor executable instructions.
- the processor 52 executes the instructions, the steps of the data processing method for implementing privacy protection described in any of the foregoing embodiments are implemented.
- the input device may specifically be one of the main devices for information exchange between the user and the computer system.
- the input device may include a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input board, a voice input device, etc.; the input device is used to input raw data and programs for processing these numbers into the computer.
- the input device can also obtain and receive data transmitted from other modules, units, and devices.
- the processor can be implemented in any suitable way.
- the processor may take the form of a microprocessor or a processor and a computer readable medium, logic gates, switches, application specific integrated circuits ( Application Specific Integrated Circuit, ASIC), programmable logic controller and embedded microcontroller form, etc.
- the memory may specifically be a memory device used to store information in modern information technology.
- the memory can include multiple levels. In a digital system, as long as it can store binary data, it can be a memory; in an integrated circuit, a circuit with a storage function without a physical form is also called a memory, such as RAM, FIFO, etc.; In the system, storage devices in physical form are also called memory, such as memory sticks, TF cards, and so on.
- the embodiment of this specification also provides a computer storage medium based on a data processing method for realizing privacy protection.
- the computer storage medium stores computer program instructions.
- the computer program instructions When the computer program instructions are executed, the computer program Describes the steps of a data processing method that realizes privacy protection.
- the above-mentioned storage medium includes, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), cache (Cache), and hard disk (Hard Disk Drive, HDD). Or memory card (Memory Card).
- the memory can be used to store computer program instructions.
- the network communication unit may be an interface set up in accordance with a standard stipulated by the communication protocol and used for network connection communication.
- modules or steps of the above-mentioned embodiments of this specification can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed among multiple computing devices.
- they can be implemented by the program code executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here.
- the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, the embodiments of this specification are not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
- 一种实现隐私保护的数据处理方法,其特征在于,由多个数据持有方执行,所述多个数据持有方中各数据持有方中存储有各自的隐私样本数据,所述方法包括:将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,得到所述预设明文机器学习模型对应的隐私机器学习模型;执行安全多方计算,以基于所述各数据持有方中存储的各自的隐私样本数据对所述隐私机器学习模型进行联合训练,输出目标机器学习模型。
- 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:获取多个密码算子,并对所述多个密码算子中各密码算子进行注册;获取优化器程序,并对所述优化器程序进行注册,其中,所述优化器程序用于将预设明文机器学习模型中的明文算子替换为对应的密码算子。
- 根据权利要求2所述的方法,其特征在于,所述优化器程序还用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数;相应地,在获取多个密码算子,并对所述多个密码算子中各密码算子进行注册之后,还包括:获取所述多个密码算子中各密码算子对应的密码算子梯度函数,并对所述各密码算子对应的密码算子梯度函数进行注册;将经注册的密码算子梯度函数与对应的经注册的密码算子进行关联。
- 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:获取安装包文件,其中,所述安装包文件是基于二进制文件构建的,所述二进制文件由用于实现多个密码算子的源码、用于实现优化器程序的源码、用于注册所述多个密码算子的源码和用于注册所述优化器程序的源码编译后得到,所述优化器程序在执行时用于将所述明文模型中的明文算子替换为所述明文算子对应的密码算子;对所述安装包文件进行安装;将安装后的文件导入所述预设明文机器学习模型,以将所述预设明文机器学习模型中的明文算子替换为对应的密码算子。
- 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:基于预设明文机器学习模型对应的数据流图中的数据流,确定所述预设明文机器学习模型中需要替换的明文算子;将所述预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。
- 根据权利要求2所述的方法,其特征在于,所述优化器程序包括以下至少之一:静态优化器程序和动态优化器程序。
- 根据权利要求2所述的方法,其特征在于,所述密码算子包括以下至少之一:安全多方计算算子、同态加密算子和零知识证明算子。
- 一种实现隐私保护的数据处理装置,其特征在于,位于多个数据持有方中各数据持有方中,所述多个数据持有方中各数据持有方中存储有各自的隐私样本数据,所述装置包括:替换模块,用于将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,得到所述预设明文机器学习模型对应的隐私机器学习模型;训练模块,用于执行安全多方计算,以基于所述各数据持有方中存储的各自的隐私样本数据对所述隐私机器学习模型进行联合训练,输出目标机器学习模型。
- 一种计算机设备,其特征在于,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现权利要求1至7中任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述指令被执行时实现权利要求1至7中任一项所述方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/080392 WO2021184347A1 (zh) | 2020-03-20 | 2020-03-20 | 实现隐私保护的数据处理方法和装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/080392 WO2021184347A1 (zh) | 2020-03-20 | 2020-03-20 | 实现隐私保护的数据处理方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021184347A1 true WO2021184347A1 (zh) | 2021-09-23 |
Family
ID=77769609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/080392 WO2021184347A1 (zh) | 2020-03-20 | 2020-03-20 | 实现隐私保护的数据处理方法和装置 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021184347A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692207A (zh) * | 2022-05-31 | 2022-07-01 | 蓝象智联(杭州)科技有限公司 | 一种隐私保护的数据处理方法、装置及存储介质 |
CN115982779A (zh) * | 2023-03-17 | 2023-04-18 | 北京富算科技有限公司 | 一种数据匿名化方法、装置、电子设备及存储介质 |
CN116383865A (zh) * | 2022-12-30 | 2023-07-04 | 上海零数众合信息科技有限公司 | 联邦学习预测阶段隐私保护方法及系统 |
CN116595569A (zh) * | 2023-07-19 | 2023-08-15 | 西南石油大学 | 一种基于联盟链的政务数据安全多方计算方法 |
WO2023231939A1 (zh) * | 2022-06-01 | 2023-12-07 | 维沃移动通信有限公司 | 业务处理方法、装置、网络设备及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325584A (zh) * | 2018-08-10 | 2019-02-12 | 深圳前海微众银行股份有限公司 | 基于神经网络的联邦建模方法、设备及可读存储介质 |
CN110619220A (zh) * | 2019-08-09 | 2019-12-27 | 北京小米移动软件有限公司 | 对神经网络模型加密的方法及装置、存储介质 |
CN110795768A (zh) * | 2020-01-06 | 2020-02-14 | 支付宝(杭州)信息技术有限公司 | 基于私有数据保护的模型学习方法、装置及系统 |
CN110851482A (zh) * | 2019-11-07 | 2020-02-28 | 支付宝(杭州)信息技术有限公司 | 为多个数据方提供数据模型的方法及装置 |
-
2020
- 2020-03-20 WO PCT/CN2020/080392 patent/WO2021184347A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325584A (zh) * | 2018-08-10 | 2019-02-12 | 深圳前海微众银行股份有限公司 | 基于神经网络的联邦建模方法、设备及可读存储介质 |
CN110619220A (zh) * | 2019-08-09 | 2019-12-27 | 北京小米移动软件有限公司 | 对神经网络模型加密的方法及装置、存储介质 |
CN110851482A (zh) * | 2019-11-07 | 2020-02-28 | 支付宝(杭州)信息技术有限公司 | 为多个数据方提供数据模型的方法及装置 |
CN110795768A (zh) * | 2020-01-06 | 2020-02-14 | 支付宝(杭州)信息技术有限公司 | 基于私有数据保护的模型学习方法、装置及系统 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114692207A (zh) * | 2022-05-31 | 2022-07-01 | 蓝象智联(杭州)科技有限公司 | 一种隐私保护的数据处理方法、装置及存储介质 |
WO2023231939A1 (zh) * | 2022-06-01 | 2023-12-07 | 维沃移动通信有限公司 | 业务处理方法、装置、网络设备及存储介质 |
CN116383865A (zh) * | 2022-12-30 | 2023-07-04 | 上海零数众合信息科技有限公司 | 联邦学习预测阶段隐私保护方法及系统 |
CN116383865B (zh) * | 2022-12-30 | 2023-10-10 | 上海零数众合信息科技有限公司 | 联邦学习预测阶段隐私保护方法及系统 |
CN115982779A (zh) * | 2023-03-17 | 2023-04-18 | 北京富算科技有限公司 | 一种数据匿名化方法、装置、电子设备及存储介质 |
CN116595569A (zh) * | 2023-07-19 | 2023-08-15 | 西南石油大学 | 一种基于联盟链的政务数据安全多方计算方法 |
CN116595569B (zh) * | 2023-07-19 | 2023-09-15 | 西南石油大学 | 一种基于联盟链的政务数据安全多方计算方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021184347A1 (zh) | 实现隐私保护的数据处理方法和装置 | |
Hu et al. | FDML: A collaborative machine learning framework for distributed features | |
RU2738826C1 (ru) | Параллельное выполнение транзакций в сети блокчейнов | |
RU2731417C1 (ru) | Параллельное выполнение транзакций в сети цепочек блоков на основе белых списков смарт-контрактов | |
CN111414646B (zh) | 实现隐私保护的数据处理方法和装置 | |
Alamri et al. | Blockchain for Internet of Things (IoT) research issues challenges & future directions: A review | |
JP6892513B2 (ja) | 信頼できる実行環境に基づいたオフチェーンスマートコントラクトサービス | |
Ohrimenko et al. | Oblivious {Multi-Party} machine learning on trusted processors | |
US10467389B2 (en) | Secret shared random access machine | |
RU2744827C2 (ru) | Белые списки смарт-контрактов | |
WO2018113642A1 (zh) | 一种面向远程计算的控制流隐藏方法及系统 | |
CN113408746A (zh) | 一种基于区块链的分布式联邦学习方法、装置及终端设备 | |
CN110969264B (zh) | 模型训练方法、分布式预测方法及其系统 | |
Law et al. | Secure collaborative training and inference for xgboost | |
US11409653B2 (en) | Method for AI model transferring with address randomization | |
Chen et al. | Developing privacy-preserving AI systems: The lessons learned | |
Leung et al. | Towards privacy-preserving collaborative gradient boosted decision trees | |
Dolev et al. | Secret shared random access machine | |
US12015691B2 (en) | Security as a service for machine learning | |
Rahaman et al. | Secure Multi-Party Computation (SMPC) Protocols and Privacy | |
Siedlecka-Lamch et al. | A fast method for security protocols verification | |
US11657332B2 (en) | Method for AI model transferring with layer randomization | |
WO2020211075A1 (zh) | 去中心化多方安全数据处理方法、装置及存储介质 | |
Hu et al. | Stochastic distributed optimization for machine learning from decentralized features | |
EP4295222A1 (en) | Secure collaborative processing of private inputs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20925420 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20925420 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/11/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20925420 Country of ref document: EP Kind code of ref document: A1 |