WO2021184347A1 - 实现隐私保护的数据处理方法和装置 - Google Patents

实现隐私保护的数据处理方法和装置 Download PDF

Info

Publication number
WO2021184347A1
WO2021184347A1 PCT/CN2020/080392 CN2020080392W WO2021184347A1 WO 2021184347 A1 WO2021184347 A1 WO 2021184347A1 CN 2020080392 W CN2020080392 W CN 2020080392W WO 2021184347 A1 WO2021184347 A1 WO 2021184347A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
plaintext
machine learning
learning model
data
Prior art date
Application number
PCT/CN2020/080392
Other languages
English (en)
French (fr)
Inventor
陈元丰
谢翔
晏意林
黄高峰
史俊杰
李升林
孙立林
Original Assignee
云图技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 云图技术有限公司 filed Critical 云图技术有限公司
Priority to PCT/CN2020/080392 priority Critical patent/WO2021184347A1/zh
Publication of WO2021184347A1 publication Critical patent/WO2021184347A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of data security technology, and in particular to a data processing method and device for realizing privacy protection.
  • the embodiments of this specification provide a data processing method and device for realizing privacy protection, so as to solve the problem of poor usability of the privacy and confidentiality machine learning framework in the prior art.
  • the embodiment of this specification provides a data processing method for realizing privacy protection, which is executed by multiple data holders, and each of the multiple data holders stores its own private sample data.
  • the method includes: Replace the plaintext operator in the preset plaintext machine learning model with the cryptographic operator corresponding to the plaintext operator to obtain the privacy machine learning model corresponding to the preset plaintext machine learning model; perform secure multiparty calculations based on the data holders
  • the stored private sample data is jointly trained on the private machine learning model, and the target machine learning model is output.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: obtaining a plurality of cryptographic operators, and comparing each cryptographic operator in the plurality of cryptographic operators Register; obtain the optimizer program, and register the optimizer program, where the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
  • the optimizer program is also used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, when obtaining multiple cryptographic operators, After registering each codon operator in a codon operator, it also includes: obtaining the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and registering the codon operator gradient function corresponding to each codon operator ; Associate the registered cryptographic operator gradient function with the corresponding registered cryptographic operator.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: obtaining an installation package file, where the installation package file is constructed based on a binary file, and the binary file It is compiled from the source code used to implement multiple cryptographic operators, the source code used to implement the optimizer program, the source code used to register multiple code operators, and the source code used to register the optimizer program.
  • the optimizer program is executed when the Used to replace the plaintext operator in the plaintext model with the cryptographic operator corresponding to the plaintext operator; install the installation package file; import the installed file into the preset plaintext machine learning model to add the preset plaintext machine learning model Replace the plaintext operator of with the corresponding cryptographic operator.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator includes: determining the preset based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model Set the plaintext operator that needs to be replaced in the plaintext machine learning model; replace the plaintext operator that needs to be replaced in the preset plaintext machine learning model with the corresponding cryptographic operator.
  • the optimizer program includes at least one of the following: a static optimizer program and a dynamic optimizer program.
  • the cryptographic operator includes at least one of the following: a secure multi-party computing operator, a homomorphic encryption operator, and a zero-knowledge proof operator.
  • the embodiment of this specification also provides a data processing device that realizes privacy protection, which is located in each of the multiple data holders, and each of the multiple data holders stores its own privacy.
  • the device includes: a replacement module for replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a private machine learning model corresponding to the preset plaintext machine learning model; training The module is used to perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
  • the embodiments of this specification also provide a computer device, including a processor and a memory for storing executable instructions of the processor.
  • the processor executes the instructions to implement the privacy protection data processing described in any of the foregoing embodiments. Method steps.
  • the embodiments of this specification also provide a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed, the steps of the data processing method for realizing privacy protection described in any of the foregoing embodiments are implemented.
  • a data processing method for realizing privacy protection is provided, which is executed by multiple data holders.
  • Each data holder stores its own private sample data, and each data holder saves data by pre-
  • the plaintext operator in the plaintext machine learning model is replaced with the corresponding cryptographic operator to obtain the private machine learning model.
  • multiple data holders perform secure multi-party calculations to perform the private machine learning model based on their private sample data. Joint training, output the target machine learning model.
  • the private machine learning model is jointly trained based on the respective private sample data to obtain the target machine learning model, which can effectively protect the privacy of the private sample data of each data holder.
  • the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
  • FIG. 1 shows a schematic diagram of an application scenario of a data processing method for realizing privacy protection in an embodiment of this specification
  • Figure 2 shows a flow chart of a data processing method for implementing privacy protection in an embodiment of this specification
  • Figure 3 shows a block diagram of a data processing method for implementing privacy protection in an embodiment of this specification
  • FIG. 4 shows a schematic diagram of a data processing device for implementing privacy protection in an embodiment of this specification
  • Fig. 5 shows a schematic diagram of a computer device in an embodiment of this specification.
  • a plaintext machine learning model can be written in a machine learning framework.
  • the plaintext machine learning model may include local plaintext operators provided by the machine learning framework.
  • the local plaintext operator in the plaintext machine learning model can be replaced with the corresponding cryptographic operator to obtain the corresponding private machine learning model. This replacement is transparent to the user.
  • the private machine learning model can be trained based on the private sample data to obtain the target machine learning model.
  • FIG. 1 shows a schematic diagram of the above-mentioned scenario example.
  • three data holders are exemplarily shown: data holder 1, data holder 2 and data holder 3.
  • the data holder 1 stores the first privacy sample data
  • the data holder 2 stores the second privacy sample data
  • the data holder 3 stores the third privacy sample data.
  • Data holder 1, data holder 2 and data holder 3 can obtain the preset plaintext machine learning model to be trained, and replace the plaintext operator in the preset plaintext machine learning model with a cryptographic operator to obtain the preset Let the plaintext machine learning model correspond to the privacy machine learning model.
  • the data holder 1, the data holder 2 and the data holder 3 can perform secure multi-party calculations to jointly train the private machine learning model based on their respective private sample data to obtain the target machine learning model. Joint training of privacy machine learning models through secure multi-party computing can protect the privacy sample data of each data holder from being leaked, effectively protecting data privacy.
  • the preset plaintext machine learning model may be implemented based on the plaintext machine learning framework.
  • the plaintext machine learning framework can be any existing plaintext machine learning framework, for example, TensorFlow, Pytorch, MxNet, CNTK-Azure and other frameworks. Therefore, this specification does not limit the specific plaintext machine learning framework used to generate the plaintext machine learning model, and it can be selected according to actual needs.
  • the cryptographic operator can be any cryptographic operator that can provide privacy protection for the input data of all parties in a scenario where two or more data holders jointly (or collaborate) to perform machine learning training and prediction.
  • the cryptographic operator may be a Secure Multi-Party Computation (MPC) operator, a homomorphic encryption (Homomorphic Encryption, HE) operator, or a zero-knowledge proof (Zero-knowledge) operator. Proof, ZKP) operator, etc.
  • MPC Secure Multi-Party Computation
  • HE homomorphic Encryption
  • ZKP zero-knowledge proof
  • this specification does not limit the specific codon operators used, which can be selected according to actual needs.
  • developers can write the source code for implementing the cryptographic operator corresponding to the local plaintext operator in the machine learning framework and the source code for registering the cryptographic operator in the machine learning framework, and write The source code of the gradient function that implements the cryptographic operator and the source code used to register the gradient function of the cryptographic operator in the machine learning framework, the source code of the optimizer program is written, and the source code of the optimizer program is registered in the machine learning framework.
  • the optimizer program is used to replace the plaintext operator in the plaintext machine model with the corresponding codon operator.
  • developers can add the source code of the implementation of the codon operator, the source code of the registered codon operator, the source code of the code operator gradient function, the source code of the registered code operator gradient function, the source code of the optimizer program and the registered optimizer program
  • the source code is compiled into a binary file.
  • a python installation package is generated based on the binary file (for example, the file name is Rosetta.whl).
  • the data holder 1, the data holder 2 and the data holder 3 can download the installation package and install it. Users of data holder 1, data holder 2 and data holder 3 only need to add a line of code "import Rosetta" under the plaintext machine learning model written by them to support the privacy protection function.
  • secure multi-party calculations can be performed to jointly train the model based on private sample data to obtain a trained target machine learning model. After the target machine learning model is obtained, the target machine learning model can be used to make predictions.
  • the target machine learning model obtained by each data holder through joint training may be a complete machine learning model, that is, the model parameters are complete.
  • each data holder can obtain the data to be predicted, and input the data to be predicted into the trained complete machine learning model to obtain the prediction result.
  • the data holder 1 can obtain the data to be predicted, and the prediction result can be obtained by inputting the data to be predicted into the complete machine learning model.
  • the target machine learning model obtained by each data holder through joint training may be an incomplete machine learning model, that is, the model parameters are incomplete.
  • one of the multiple data holders can obtain the data to be predicted, and the data holder and other data holders perform multi-party security calculations to obtain the joint prediction result of the data to be predicted .
  • the data holder 1 can obtain the data to be predicted.
  • the data holder 1 and the data holder 2 and the data holder 3 perform secure multi-party calculations to perform joint prediction based on the data to be predicted and the target machine learning model to obtain a joint prediction result.
  • the embodiment of this specification provides a data processing method for realizing privacy protection, which is executed by multiple data holders, and each of the multiple data holders stores its own private sample data.
  • Fig. 2 shows a schematic diagram of an application scenario of a data processing method for realizing privacy protection in an embodiment of this specification.
  • this specification provides method operation steps or device structures as shown in the following embodiments or drawings, more or less operation steps or module units may be included in the method or device based on conventional or no creative labor. .
  • the execution order of these steps or the module structure of the device is not limited to the execution order or module structure shown in the description of the embodiments of this specification and the drawings.
  • the described method or module structure is applied to an actual device or terminal product, it can be executed sequentially or in parallel according to the method or module structure connection shown in the embodiments or drawings (for example, parallel processors or multi-threaded processing Environment, even distributed processing environment).
  • the data processing method for realizing privacy protection may include the following steps.
  • Step S201 replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a privacy machine learning model corresponding to the preset plaintext machine learning model.
  • each of the multiple data holders may store a preset plaintext machine learning model.
  • the preset plaintext machine learning model may be a machine model established by a user in a machine learning framework.
  • the machine learning framework can refer to all machine learning systems or methods including machine learning algorithms, and can include data representation and processing methods, methods for representing and suggesting predictive models, and methods for evaluating and using modeling results.
  • the machine learning framework can include one of the following: TensorFlow, Pytorch, MxNet, CNTK-Azure and other frameworks.
  • the machine learning framework can include multiple local plaintext operators.
  • the preset plaintext machine learning model established based on the machine learning framework may include local plaintext operators (referred to as plaintext operators) in the machine learning framework.
  • Each of the multiple data holders may store their own private sample data.
  • each data holder can replace the plaintext operator in the preset plaintext machine learning model with the password operator corresponding to the plaintext operator to obtain the preset plaintext machine learning
  • the privacy machine learning model corresponding to the model refers to an operator that implements the corresponding operation of the plaintext operator in the case of encryption.
  • Step S202 Perform secure multi-party calculation to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
  • each data holder After each data holder obtains the private machine learning model, multiple data holders can perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder.
  • the model parameters in the privacy machine learning model can be determined, and the trained target machine learning model can be obtained.
  • each data holder After outputting the target machine learning model, each data holder can process the data to be predicted based on the target machine learning model to obtain a prediction result.
  • the method in the above embodiment only needs to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator to obtain the corresponding private machine learning model, and then multiple data holders perform secure multiparty calculations ,
  • the private machine learning model can be jointly trained based on the respective private sample data to obtain the target machine learning model, which can effectively protect the privacy of the private sample data of each data holder.
  • the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: obtaining multiple cryptographic operators, and comparing each of the multiple cryptographic operators.
  • the password operator is registered; the optimizer program is obtained, and the optimizer program is registered, where the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding password operator.
  • each data holder can obtain multiple cryptographic operators.
  • each of the multiple cipher operators can correspond to multiple plaintext operators in the machine learning framework. That is, obtaining multiple cryptographic operators may include cryptographic operators corresponding to the plaintext operators in the preset plaintext machine learning model.
  • each data holder can receive multiple cryptographic operators input by the user.
  • multiple cryptographic operators may be stored in the server, and each data holder may send an acquisition request to the server, and the server sends the multiple cryptographic operators to each data holder in response to the acquisition request.
  • each data holder can register each cryptographic operator into the machine learning framework. For example, each data holder may register each of the multiple cryptographic operators into the machine learning framework based on the operator registration interface provided by the machine learning framework.
  • Each data holder can obtain the optimizer program.
  • the optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model established based on the machine learning framework with the corresponding cryptographic operator.
  • each data holder can receive an optimizer program input by the user.
  • the optimizer program may be stored in the server, and each data holder may send an acquisition request to the server, and the server sends the optimizer program to each data holder in response to the acquisition request.
  • each data holder can register the optimizer program in the machine learning framework. For example, each data holder may register the optimizer program in the machine learning framework based on the optimizer registration interface provided by the machine learning framework.
  • the plaintext operators in the preset plaintext machine learning model established in the machine learning framework can be replaced with corresponding passwords Operator to obtain the privacy machine learning model corresponding to the preset plaintext machine learning model.
  • the optimizer program may include at least one of the following: a static optimizer program and a dynamic optimizer program.
  • the static optimizer program refers to replacing the plaintext operator in the machine learning model with the corresponding cryptographic operator before the data flow graph in the plaintext machine learning model is executed.
  • the dynamic optimizer program refers to replacing the plaintext operator in the plaintext machine learning model with the corresponding cryptographic operator during the execution of the data flow graph in the plaintext machine learning model.
  • the optimizer program may include a static optimizer program, may also include a dynamic optimizer program, or may include both a dynamic optimizer program and a static optimizer program.
  • the optimizer program may include at least one of a static optimizer program and a dynamic optimizer program, which has high flexibility.
  • the optimizer program can also be used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, when obtaining multiple cryptographic operators, And after registering each codon operator in the multiple codon operators, it may also include: obtaining the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and verifying the codon operator corresponding to each codon operator The gradient function is registered; the registered codon operator gradient function is associated with the corresponding registered codon operator.
  • each data holder can obtain the cryptographic operator gradient function corresponding to each of the multiple cryptographic operators. For example, each data holder may receive the cryptographic operator gradient function corresponding to each cryptographic operator input by the user.
  • the cryptographic operator gradient function corresponding to each cryptographic operator can be stored in the server, and each data holder can send an acquisition request to the server, and the server responds to the acquisition request to convert the cryptographic operator gradient function corresponding to each cryptographic operator Send to each data holder.
  • each data holder can learn the codon operator gradient function corresponding to each codon operator into the framework. For example, each data holder may register the optimizer program in the machine learning framework based on the gradient function registration interface provided by the machine learning framework. Each data holder can also associate each codon operator with the codon operator gradient function corresponding to each codon operator. By associating the password operator with the corresponding gradient function, the machine learning framework can find the matching password operator gradient function according to the password operator when performing automatic derivation.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: obtaining an installation package file, where the installation package file is constructed based on a binary file , The binary file is compiled from the source code used to implement multiple cryptographic operators, the source code used to implement the optimizer program, the source code used to register multiple cryptographic operators, and the source code used to register the optimizer program.
  • the optimizer program During execution, it is used to replace the plaintext operator in the plaintext model with the corresponding cryptographic operator of the plaintext operator; install the installation package file; import the installed file into the preset plaintext machine learning model to replace the preset plaintext machine
  • the plaintext operator in the learning model is replaced with the corresponding cryptographic operator.
  • the replacement of the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator can be achieved in the form of an installation package.
  • the source code for implementing multiple cryptographic operators, the source code for registering multiple cryptographic operators, the source code for implementing the optimizer program, and the source code for registering the optimizer program can be compiled into a binary file.
  • an installation package file can be generated based on the compiled binary file.
  • Each data holder can obtain the installation package file.
  • the installation package file can be stored in the server.
  • Each data holder can send a download request to the server, and the server sends an installation package file to each data holder in response to the download request.
  • each data holder can install the installation package file to obtain the installed file.
  • the installed file may include code for interpretation and execution and executable code.
  • each data holder can import the installed file into the plaintext machine learning model, thereby replacing the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
  • users of each data holder can add a line of code to the preset plaintext model (for example, import Rosetta, Rosetta is the name of the installation package), and then import the installed files into the preset plaintext machine learning.
  • the plaintext operator in the plaintext machine learning model can be replaced with the corresponding codon operator.
  • the method in the above embodiment only needs to add a line of code to the preset plaintext machine learning model to obtain the privacy machine learning model corresponding to the plaintext machine learning model.
  • the operation is convenient and simple, and there is no need to add other application program interfaces or add others.
  • the privacy data type is easy to use.
  • the plaintext machine learning model can be transplanted very conveniently through the above method. Compared with other privacy machine learning frameworks, if you want to transplant a plaintext machine learning model, you must use the corresponding privacy data type and privacy application program interface to rewrite, and to use this solution, you only need to add a line of import code to the source code of the plaintext model. Can support privacy protection, so the portability is extremely high. In addition, the model can be easily extended through the above methods.
  • replacing the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator may include: data flow based on the data flow graph corresponding to the preset plaintext machine learning model , Determine the plaintext operator that needs to be replaced in the preset plaintext machine learning model; replace the plaintext operator that needs to be replaced in the preset plaintext machine learning model with the corresponding cryptographic operator.
  • the plaintext operator to be replaced in the preset plaintext machine learning model may be determined first.
  • the plaintext operators that need to be replaced in the preset plaintext machine learning model can be determined based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model.
  • the data flow graph can be used to represent the data flow information in the plaintext machine learning model.
  • the data flow graph is a tensor flow graph.
  • the nodes in the tensorflow graph represent mathematical operations in the graph, and the lines in the graph represent multi-dimensional data arrays that are interconnected between nodes, that is, tensors.
  • the plaintext operator in the preset plaintext machine learning model is replaced with the operator corresponding to the plaintext operator, so the privacy can be changed.
  • the operator through which the sample data flows is determined as the plaintext operator to be replaced.
  • the privacy sample data is used to train the model to obtain model parameters (also referred to as training variables)
  • the operator through which the training variable flows can be determined as the plaintext operator to be replaced.
  • the plaintext operator that needs to be replaced in the preset plaintext machine learning model can be replaced with the corresponding cryptographic operator.
  • the plaintext operator to be replaced can be determined, thereby completing the replacement.
  • the cryptographic operator can provide privacy protection for the input data of all parties in the joint (or collaborative) machine learning model training and prediction scenarios of two or more data holders.
  • Codon operator may include secure multi-party computation operators (MPC OPs), homomorphic encryption operators (HE OPs), or zero-knowledge proof operators (ZK OPs).
  • MPC OPs secure multi-party computation operators
  • HE OPs homomorphic encryption operators
  • ZK OPs zero-knowledge proof operators
  • the secure multi-party calculation operator may include one or more of Garbled Circuit, Oblivious Transfer, and Secret Sharing.
  • the foregoing codon operators are only exemplary, and this specification does not limit the specific codon operators to be used, and can be specifically selected according to needs.
  • the secure multi-party calculation in step S202 can be executed based on the cryptographic operator replaced in step S201.
  • the secure multi-party calculation in step S202 may be a generalized secure multi-party calculation, and may include at least one of MPC, HE, and ZK.
  • MPC multi-party calculation
  • HE high-power plaintext machine learning model
  • ZK ZK calculations when training the model.
  • the cryptographic operator can be implemented in C language or C++ language.
  • C language or C++ language By adopting C language or C++ language to realize the cryptographic operator, the performance of the machine model can be improved.
  • the cryptographic algorithms are all implemented using python.
  • the cryptographic operators are all implemented using C/C++, so the performance is high.
  • other languages can also be used to implement cryptographic operators, such as Python language.
  • the TensorFlow framework is taken as an example for description.
  • the data processing method for implementing privacy protection may include the following steps.
  • Step 1 The framework developer uses C/C++ to compile the source code of the cryptographic operators (Crypto OPs) corresponding to the native plaintext operators (TF Native OPs) that implement the TensorFlow framework on various devices.
  • the TensorFlow framework supports distributed data processing, it can be distributed and executed on a variety of devices.
  • various devices may include CPU (Central Processing Unit), GPU (Graphics Processing Unit), or TPU (Tensor Processing Unit), etc. Since different devices have different instruction sets for implementing cryptographic operators, it is necessary to implement cryptographic operators suitable for different devices.
  • Step 2 The framework developer uses C/C++ to write the source code for registering the cryptographic operator in the TensorFlow framework.
  • Step 3 The framework developer uses C/C++ to write the source code for implementing the cryptographic operator gradient functions (Crypto OP Gradient Functions) corresponding to the cryptographic operator.
  • Step 4 The framework developer uses C/C++ to write the source code for registering the cryptographic operator gradient function in the TensorFlow framework and associating the cryptographic operator with the cryptographic operator gradient function. Associating the corresponding cipher operator with the cipher operator gradient function enables the Tensorflow framework to automatically find the cipher operator gradient function matching it according to the cipher operator when performing automatic derivation.
  • Step 5 The framework developer uses C/C++ to write an optimizer program for replacing the local plaintext operator in the TensorFlow Graph (tensor flow graph) corresponding to the plaintext machine learning model with the corresponding cryptographic operator.
  • the optimizer program may include a static optimizer program (Static Pass) or a dynamic optimizer program (Dynamic Pass) or both.
  • the function implemented by the static optimizer program is to replace before TensorFlow Graph is executed.
  • the function implemented by the dynamic optimizer program is to replace when TensorFlow Graph is executed. Combining the static optimizer program and the dynamic optimizer program can improve efficiency and flexibility.
  • the method in this specific embodiment does not add other application program interfaces, nor does it add other private data types. It only needs to automatically use cryptographic operators at the bottom of the TensorFlow framework to replace the framework's local plaintext calculations. son. Therefore, for users, the use of private machine learning algorithms in this solution is consistent with the use of plaintext machine learning in algorithm coding. Therefore, compared with the solutions implemented by the existing privacy framework, the ease of use of this solution is higher.
  • Step 6 The framework developer uses C/C++ to write the source code for registering the static optimizer program and/or the dynamic optimizer program in the TensorFlow framework.
  • Step 7 based on the compiled source code for implementing the cryptographic operator, the source code of the registered cryptographic operator, the source code of implementing the cryptographic operator gradient function, the source code of the registered cryptographic operator gradient function, the source code of implementing the optimizer program, and the registered optimizer program
  • the source code build binary files, for example, generate Rosetta.so or Rosetta.dll.
  • Step 8 Based on the obtained binary file, make a python pip installation package, such as Rosetta.whl.
  • Step 9 Send the generated installation package to each data holder among the multiple data holders.
  • Step 10 Each data holder installs the obtained installation package, and adds a line of code import Rosetta to the plaintext machine learning model in each data holder, and then the installed file can be imported into the plaintext machine learning model to obtain The corresponding privacy machine learning model.
  • Step 11 Multiple data holders execute secure multi-party calculations to jointly train the private machine learning model based on the private sample data stored in each data holder to obtain the target machine learning model.
  • the target machine learning model can be used for subsequent prediction of data to be predicted.
  • FIG. 3 shows a block diagram of a data processing method for implementing privacy protection provided in an embodiment of this specification.
  • the block diagram mainly includes the following modules: Rosetta Static Pass (static optimizer module), Rosetta Dynamic Pass (dynamic optimizer module) and Rosetta Crypto Ops (cryptographic operator module).
  • Rosetta Static Pass static optimizer module
  • Rosetta Dynamic Pass dynamic optimizer module
  • Rosetta Crypto Ops crypto operator module
  • Rosetta Static Pass is a static graph optimization module based on the python programming interface. It is responsible for replacing the corresponding TF Native OPs in the TensorFlow graph with the corresponding cryptographic operators (Crypto OPs) before the execution of the TensorFlow graph.
  • These Crypto OPs can include encryption operators such as secure multi-party computing operators (MPC Ops), homomorphic encryption operators (HE Ops), and zero-knowledge proof operators (ZK Ops).
  • MPC Ops secure multi-party computing operators
  • HE Ops homomorphic encryption operators
  • ZK Ops zero-knowledge proof operators
  • which cryptographic operator to replace depends on the user's configuration, if the user is configured for secure multi-party computing, then replace TF Native OPs with MPC Ops. In the same way, if the user is configured as HE or ZK, HE Ops or ZK Ops will be used to replace TF Native Ops. It can all be implemented in C/C++.
  • Rosetta Dynamic Pass is a dynamic graph optimization module based on the C/C++ programming interface. It is responsible for replacing the corresponding TF Native OPs in the TensorFlow graph with the corresponding Crypto OPs when the TensorFlow graph is executed. It can all be implemented in C/C++.
  • Rosetta Crypto Ops is a module used to implement various types of Crypto Ops, which can include encryption operators such as MPC Ops, HE Ops, ZK Ops, etc. These Crypto Ops correspond to TF Native Ops and can all be implemented in C/C++.
  • the embodiment of this specification also provides a data processing device for realizing privacy protection.
  • the device is located in each of the multiple data holders, and each of the multiple data holders is There are respective private sample data stored in Youfang, as described in the following embodiment. Since the problem-solving principle of the data processing device that realizes privacy protection is similar to that of the data processing method that realizes privacy protection, the implementation of the data processing device that realizes privacy protection can refer to the implementation of data processing method that realizes privacy protection, and the repetition will not be repeated here. .
  • the term "unit” or “module” can be a combination of software and/or hardware that implements a predetermined function.
  • Fig. 4 is a structural block diagram of a data processing device for implementing privacy protection according to an embodiment of this specification. As shown in Fig. 4, it includes: a replacement module 401 and a training module 402. The structure is described below.
  • the replacement module 401 is configured to replace the plaintext operator in the preset plaintext machine learning model with a cryptographic operator corresponding to the plaintext operator to obtain a privacy machine learning model corresponding to the preset plaintext machine learning model.
  • the training module 402 is used to perform secure multi-party calculations to jointly train the private machine learning model based on the respective private sample data stored in each data holder, and output the target machine learning model.
  • the replacement module can be used to: obtain multiple codon operators and register each of the multiple codon operators; obtain an optimizer program, and register the optimizer program, where , The optimizer program is used to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator.
  • the optimizer program is also used to replace the plaintext operator gradient function in the preset plaintext machine learning model with the corresponding cryptographic operator gradient function; accordingly, the replacement module can also be used to: Multiple codon operators, and after registering each codon operator in the multiple codon operators, obtain the codon operator gradient function corresponding to each codon operator in the multiple codon operators, and compare the corresponding password of each codon operator The operator gradient function is registered; the registered password operator gradient function is associated with the corresponding registered password operator.
  • the replacement module can be used to: obtain an installation package file, where the installation package file is constructed based on a binary file, and the binary file is composed of the source code used to implement multiple cryptographic operators and used to implement the optimizer.
  • the source code of the program, the source code used to register multiple cryptographic operators, and the source code used to register the optimizer program are compiled.
  • the optimizer program is used to replace the plaintext operator in the plaintext model with the corresponding plaintext operator during execution Password operator; install the installation package file; import the installed file into the preset plaintext machine learning model to replace the plaintext operator in the preset plaintext machine learning model with the corresponding password operator.
  • the replacement module can be used to: determine the plaintext operators that need to be replaced in the preset plaintext machine learning model based on the data flow in the data flow graph corresponding to the preset plaintext machine learning model; The plaintext operators that need to be replaced in the machine learning model are replaced with the corresponding cryptographic operators.
  • the optimizer program includes at least one of the following: a static optimizer program and a dynamic optimizer program.
  • the cryptographic operator includes at least one of the following: a secure multi-party computing operator, a homomorphic encryption operator, and a zero-knowledge proof operator.
  • the embodiments of this specification achieve the following technical effects: only need to replace the plaintext operator in the preset plaintext machine learning model with the corresponding cryptographic operator to obtain the corresponding private machine learning Model, after multiple data holders perform secure multi-party calculations, they can jointly train the private machine learning model based on their respective private sample data to obtain the target machine learning model, which can effectively protect the private sample data of each data holder Privacy.
  • the above solution does not need to add other application program interfaces, nor does it need to add other private data types. It only needs to replace the plaintext operator in the plaintext machine learning model with the corresponding password operator, which is convenient and simple to operate and easy to use.
  • the embodiment of this specification also provides a computer device.
  • the computer device may specifically include an input device. 51.
  • the processor 52 and the memory 53.
  • the memory 53 is used to store processor executable instructions.
  • the processor 52 executes the instructions, the steps of the data processing method for implementing privacy protection described in any of the foregoing embodiments are implemented.
  • the input device may specifically be one of the main devices for information exchange between the user and the computer system.
  • the input device may include a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input board, a voice input device, etc.; the input device is used to input raw data and programs for processing these numbers into the computer.
  • the input device can also obtain and receive data transmitted from other modules, units, and devices.
  • the processor can be implemented in any suitable way.
  • the processor may take the form of a microprocessor or a processor and a computer readable medium, logic gates, switches, application specific integrated circuits ( Application Specific Integrated Circuit, ASIC), programmable logic controller and embedded microcontroller form, etc.
  • the memory may specifically be a memory device used to store information in modern information technology.
  • the memory can include multiple levels. In a digital system, as long as it can store binary data, it can be a memory; in an integrated circuit, a circuit with a storage function without a physical form is also called a memory, such as RAM, FIFO, etc.; In the system, storage devices in physical form are also called memory, such as memory sticks, TF cards, and so on.
  • the embodiment of this specification also provides a computer storage medium based on a data processing method for realizing privacy protection.
  • the computer storage medium stores computer program instructions.
  • the computer program instructions When the computer program instructions are executed, the computer program Describes the steps of a data processing method that realizes privacy protection.
  • the above-mentioned storage medium includes, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), cache (Cache), and hard disk (Hard Disk Drive, HDD). Or memory card (Memory Card).
  • the memory can be used to store computer program instructions.
  • the network communication unit may be an interface set up in accordance with a standard stipulated by the communication protocol and used for network connection communication.
  • modules or steps of the above-mentioned embodiments of this specification can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed among multiple computing devices.
  • they can be implemented by the program code executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here.
  • the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, the embodiments of this specification are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本说明书提供了一种实现隐私保护的数据处理方法和装置,其中,该方法由多个数据持有方执行,多个数据持有方中各数据持有方中存储有各自的隐私样本数据,该方法包括:将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型;执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。上述方案仅需将明文机器学习模型中的明文算子替换为对应的密码学算子,即可保护隐私样本数据不被泄露,操作方便简单,易用性强。

Description

实现隐私保护的数据处理方法和装置 技术领域
本申请涉及数据安全技术领域,特别涉及一种实现隐私保护的数据处理方法和装置。
背景技术
随着人工神经网络反向传播算法的提出,掀起了机器学习的热潮。机器学习需要大量的样本数据,人们开始担心自己的隐私数据被收集后会被泄露或者是被不正当使用。因此,如何将隐私数据很好地保护起来对未来机器学习至关重要。
目前,结合密码学与AI(Artificial Intelligence,人工智能)能够解决现在数据行业隐私保护的问题。因此,各种基于加密机器学习的框架(例如,TF-Encrypted、PySyft等)应运而生。这些加密机器学习框架与主流AI框架(例如,Tensorflow、PyTorch等)类似,利用了框架API(Application Programming Interface,应用程序接口)的易用性,同时通过MPC(Secure Multi-Party Computation,安全多方计算)和HE(Homomorphic Encryption,同态加密)等多种密码学算法对加密数据进行训练和预测。然而,这些加密机器学习框架在易用性、可扩展性和性能等方面均存在缺陷,特别是易用性将会影响这些加密机器学习框架的推广。
针对上述问题,目前尚未提出有效的解决方案。
发明内容
本说明书实施例提供了一种实现隐私保护的数据处理方法和装置,以解决现有技术中的隐私机密机器学习框架的易用性差的问题。
本说明书实施例提供了一种实现隐私保护的数据处理方法,由多个数据持有方执行,多个数据持有方中各数据持有方中存储有各自的隐私样本数据,该方法包括:将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型;执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。
在一个实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,包括:获取多个密码算子,并对多个密码算子中各密码算子进行注册;获取优化器程序,并对优化器程序进行注册,其中,优化器程序用于将预设明文机器学习模型中的明文算子替换为对应的密码算子。
在一个实施例中,优化器程序还用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数;相应地,在获取多个密码算子,并对多个密码算子中各密码算子进行注册之后,还包括:获取多个密码算子中各密码算子对应的密码算子梯度函数,并对各密码算子对应的密码算子梯度函数进行注册;将经注册的密码算子梯度函数与对应的经注册的密码算子进行关联。
在一个实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,包括:获取安装包文件,其中,安装包文件是基于二进制文件构建的,二进制文件由用于实现多个密码算子的源码、用于实现优化器程序的源码、用于注册多个密码算子的源码和用于注册优化器程序的源码编译后得到,优化器程序在执行时用于将明文模型中的明文算子替换为明文算子对应的密码算子;对安装包文件进行安装;将安装后的文件导入预设明文机器学习模型,以将预设明文机器学习模型中的明文算子替换为对应的密码算子。
在一个实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,包括:基于预设明文机器学习模型对应的数据流图中的数据流,确定预设明文机器学习模型中需要替换的明文算子;将预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。
在一个实施例中,优化器程序包括以下至少之一:静态优化器程序和动态优化器程序。
在一个实施例中,密码算子包括以下至少之一:安全多方计算算子、同态加密算子和零知识证明算子。
本说明书实施例还提供了一种实现隐私保护的数据处理装置,位于多个数据持有方中各数据持有方中,多个数据持有方中各数据持有方中存储有各自的隐私样本数据,该装置包括:替换模块,用于将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型;训练模块,用于执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。
本说明书实施例还提供一种计算机设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现上述任意实施例中所述的实现隐私保护的数据处理方法的步骤。
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机指令,所述指令被执行时实现上述任意实施例中所述的实现隐私保护的数据处理方法的步骤。
在本说明书实施例中,提供了一种实现隐私保护的数据处理方法,由多个数据持有方 执行,各数据持有方中存储有各自的隐私样本数据,各数据持有方通过将预设明文机器学习模型中的明文算子替换为对应的密码算子,得到隐私机器学习模型,之后,多个数据持有方执行安全多方计算,以基于各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。上述方案中,仅需将预设明文机器学习模型中的明文算子替换为对应的密码算子,即可得到对应的隐私机器学习模型,之后多个数据持有方执行安全多方计算,即可基于各自的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型,因而可以有效保护各个数据持有方的隐私样本数据的隐私。此外,上述方案无需增加其他的应用程序接口,也无需增加其他的隐私数据类型,仅需将明文机器学习模型中的明文算子替换为对应的密码算子,操作方便简单,易用性强。通过上述方案解决了现有的隐私机密机器学习框架的易用性差的技术问题,达到了有效保护数据隐私并提高机器学习框架易用性的技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,并不构成对本申请的限定。在附图中:
图1示出了本说明书一个实施例中实现隐私保护的数据处理方法的应用场景的示意图;
图2示出了本说明书一个实施例中的实现隐私保护的数据处理方法的流程图;
图3示出了本说明书一个实施例中的实现隐私保护的数据处理方法的框图;
图4示出了本说明书一个实施例中的实现隐私保护的数据处理装置的示意图;
图5示出了本说明书一个实施例中的计算机设备的示意图。
具体实施方式
下面将参考若干示例性实施方式来描述本申请的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本申请,而并非以任何方式限制本申请的范围。相反,提供这些实施方式是为了使本申请公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
本领域的技术人员知道,本说明书的实施方式可以实现为一种系统、装置设备、方法或计算机程序产品。因此,本申请公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。
在一个场景示例中,可以在机器学习框架中编写明文机器学习模型。其中,明文机器学习模型中可以包括机器学习框架提供的本地明文算子。为了在利用样本数据训练机器模型时保护样本数据的隐私,可以将明文机器学习模型中的本地明文算子替换为对应的密码算子,得到对应的隐私机器学习模型。这种替换对用户来说是透明的。之后,可以基于隐私样本数据对隐私机器学习模型进行训练,得到目标机器学习模型。
请参考图1,图1示出了上述场景示例的示意图。在图1中,示例性地示出了3个数据持有方:数据持有方1、数据持有方2和数据持有方3。数据持有方1中存储有第一隐私样本数据,数据持有方2中存储有第二隐私样本数据,数据持有方3中存储有第三隐私样本数据。数据持有方1、数据持有方2和数据持有方3可以获取待训练的预设明文机器学习模型,并将预设明文机器学习模型中的明文算子替换为密码算子,得到预设明文机器学习模型对应的隐私机器学习模型。之后,数据持有方1、数据持有方2和数据持有方3可以执行安全多方计算,以基于各自的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型。通过安全多方计算对隐私机器学习模型进行联合训练,可以保护各个数据持有方的隐私样本数据不被泄露,有效保护数据隐私。
在本场景示例中,预设明文机器学习模型可以是基于明文机器学习框架实现的。其中,明文机器学习框架可以为任何已有的明文机器学习框架,例如,TensorFlow、Pytorch、MxNet和CNTK-Azure等框架。因此,本说明书对于明文机器学习模型具体采用何种明文机器学习框架生成不作限定,具体可根据需要选择。
在本场景示例中,密码算子可以为任何可在两个或多个数据持有方联合(或协同)进行机器学习训练及预测场景中,为各方输入数据提供隐私保护的密码算子。例如,在一些示例性实施方案中,密码算子可以为安全多方计算(Secure Multi-Party Computation,MPC)算子、同态加密(Homomorphic Encryption,HE)算子、或零知识证明(Zero-knowledge Proof,ZKP)算子等。同样,本说明书对具体采用何种密码算子不作限定,具体可根据需要选择。
在本场景示例中,具体的,开发人员可以编写用于实现机器学习框架中的本地明文算子对应的密码算子的源码以及用于在机器学习框架中注册密码算子的源码,编写用于实现密码算子的梯度函数的源码以及用于在机器学习框架中注册密码算子梯度函数的源码,编写实现优化器程序的源码以及在机器学习框架中注册优化器程序的源码。其中,优化器程序用于将明文机器模型中的明文算子替换为对应的密码算子。之后,开发人员可以将实现密码算子的源码、注册密码算子的源码、实现密码算子梯度函数的源码、注册密码算子梯度函数的源码、实现优化器程序的源码以及注册优化器程序的源码编译成二进制文件。之 后,基于该二进制文件生成python安装包(例如,文件名为Rosetta.whl)。数据持有方1、数据持有方2和数据持有方3可以下载该安装包并进行安装。数据持有方1、数据持有方2和数据持有方3的用户仅需在其编写的明文机器学习模型下增加一行代码“import Rosetta”即可支持隐私保护功能。之后,可以执行安全多方计算,以基于隐私样本数据对该模型进行联合训练,得到训练好的目标机器学习模型。在得到目标机器学习模型之后,可以利用该目标机器学习模型进行预测。
在本场景示例中,各个数据持有方通过联合训练得到的目标机器学习模型可以是完整机器学习模型,即模型参数是完整的。在这种情况下,各数据持有方可以获取待预测数据,将待预测数据输入训练好的完整机器学习模型中,得到预测结果。例如,数据持有方1可以获取待预测数据,通过将待预测数据输入完整机器学习模型,即可得到预测结果。
在本场景示例中,各个数据持有方通过联合训练得到的目标机器学习模型可以是不完整机器学习模型,即模型参数是不完整的。在这种情况下,多个数据持有方中的一个数据持有方可以获取待预测数据,该数据持有方与其他数据持有方进行多方安全计算,以得到待预测数据的联合预测结果。例如,数据持有方1可以获取待预测数据。之后,数据持有方1与数据持有方2和数据持有方3进行安全多方计算,以基于待预测数据和目标机器学习模型进行联合预测,得到联合预测结果。
本说明书实施例提供了一种实现隐私保护的数据处理方法,由多个数据持有方执行,多个数据持有方中各数据持有方中存储有各自的隐私样本数据。图2示出了本说明书一个实施例中实现隐私保护的数据处理方法的应用场景的示意图。虽然本说明书提供了如下述实施例或附图所示的方法操作步骤或装置结构,但基于常规或者无需创造性的劳动在所述方法或装置中可以包括更多或者更少的操作步骤或模块单元。在逻辑性上不存在必要因果关系的步骤或结构中,这些步骤的执行顺序或装置的模块结构不限于本说明书实施例描述及附图所示的执行顺序或模块结构。所述的方法或模块结构的在实际中的装置或终端产品应用时,可以按照实施例或者附图所示的方法或模块结构连接进行顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至分布式处理环境)。
具体地,如图2所示,本说明书一种实施例提供的实现隐私保护的数据处理方法可以包括以下步骤。
步骤S201,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型。
具体的,多个数据持有方中各数据持有方可以存储有预设明文机器学习模型。其中, 预设明文机器学习模型可以是用户在机器学习框架中建立的机器模型。机器学习框架可以是指包括机器学习算法在内的所有机器学习的系统或方法,可以包括数据表示与处理的方法、表示和建议预测模型的方法、评价和使用建模结果的方法。机器学习框架可以包括以下之一:TensorFlow、Pytorch、MxNet和CNTK-Azure等框架。机器学习框架中可以包括多个本地明文算子。基于机器学习框架建立的预设明文机器学习模型中可以包括该机器学习框架中的本地明文算子(简称明文算子)。
多个数据持有方中各数据持有方中可以存储有各自的隐私样本数据。为了保证各自隐私样本数据不泄露给其他数据持有方,各数据持有方可以将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型。其中,明文算子对应的密码算子是指在加密的情况下实现该明文算子对应操作的算子。
步骤S202,执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。
在各数据持有方得到隐私机器学习模型之后,多个数据持有方可以执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练。通过对隐私机器学习模型进行联合训练,可以确定隐私机器学习模型中的模型参数,得到训练好的目标机器学习模型。在输出目标机器学习模型之后,各数据持有方可以基于该目标机器学习模型对待预测数据进行处理,得到预测结果。
上述实施例中的方法,仅需将预设明文机器学习模型中的明文算子替换为对应的密码算子,即可得到对应的隐私机器学习模型,之后多个数据持有方执行安全多方计算,即可基于各自的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型,因而可以有效保护各个数据持有方的隐私样本数据的隐私。此外,上述方案无需增加其他的应用程序接口,也无需增加其他的隐私数据类型,仅需将明文机器学习模型中的明文算子替换为对应的密码算子,操作方便简单,易用性强。
在本说明书一些实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,可以包括:获取多个密码算子,并对多个密码算子中各密码算子进行注册;获取优化器程序,并对优化器程序进行注册,其中,优化器程序用于将预设明文机器学习模型中的明文算子替换为对应的密码算子。
具体的,各数据持有方可以获取多个密码算子。其中,多个密码算子中的各密码算子可以与机器学习框架中的多个明文算子对应。即,获得多个密码算子可以包括预设明文机 器学习模型中的明文算子对应的密码算子。例如,各数据持有方可以接收用户输入的多个密码算子。又例如,多个密码算子可以存储在服务器中,各数据持有方可以向服务器发送获取请求,服务器响应于该获取请求将多个密码算子发送至各数据持有方。
在获得多个密码算子之后,各数据持有方可以将各密码算子注册到机器学习框架中。例如,各数据持有方可以基于机器学习框架提供的算子注册接口将多个密码算子中各密码算子注册到机器学习框架中。
各数据持有方可以获取优化器程序。其中,优化器程序用于将基于机器学习框架建立的预设明文机器学习模型中的明文算子替换为对应的密码算子。例如,各数据持有方可以接收用户输入的优化器程序。又例如,优化器程序可以存储在服务器中,各数据持有方可以向服务器发送获取请求,服务器响应于该获取请求将优化器程序发送至各数据持有方。
在获得优化器程序之后,各数据持有方可以将优化器程序注册到机器学习框架中。例如,各数据持有方可以基于机器学习框架提供的优化器注册接口将优化器程序注册到机器学习框架中。
上述实施例中的方法,通过将多个密码算子和优化器程序注册到机器学习框架中,可以将该机器学习框架中建立的预设明文机器学习模型中的明文算子替换为对应的密码算子,从而得到预设明文机器学习模型对应的隐私机器学习模型。
在本说明书一些实施例中,优化器程序可以包括以下至少之一:静态优化器程序和动态优化器程序。其中,静态优化器程序是指在明文机器学习模型中的数据流图执行之前,将机器学习模型中的明文算子替换为对应的密码算子。动态优化器程序是指在明文机器学习模型中的数据流图执行的过程中,将明文机器学习模型中的明文算子替换为对应的密码算子。优化器程序可以包括静态优化器程序,也可以包括动态优化器程序,或者可以包括动态优化器程序和静态优化器程序两者。上述实施例中,优化器程序可以包括静态优化器程序和动态优化器程序中的至少之一,灵活性高。
在本说明书一些实施例中,优化器程序还可以用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数;相应地,在获取多个密码算子,并对多个密码算子中各密码算子进行注册之后,还可以包括:获取多个密码算子中各密码算子对应的密码算子梯度函数,并对各密码算子对应的密码算子梯度函数进行注册;将经注册的密码算子梯度函数与对应的经注册的密码算子进行关联。
具体的,考虑到机器学习会涉及到反向传播算法,而反向传播算子会涉及到梯度函数,因此,需要将预设明文机器学习模型中的明文算子替换为密码算子以及将预设明文机器学 习模型中的明文算子梯度函数替换为密码算子梯度函数。即,优化器程序还可以用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数。
在各数据持有方将获得的多个密码算子注册到机器学习框架中之后,各数据持有方可以获取多个密码算子中各密码算子对应的密码算子梯度函数。例如,各数据持有方可以接收用户输入的各密码算子对应的密码算子梯度函数。又例如,各密码算子对应的密码算子梯度函数可以存储在服务器中,各数据持有方可以向服务器发送获取请求,服务器响应于该获取请求将各密码算子对应的密码算子梯度函数发送至各数据持有方。
在获得各密码算子对应的密码算子梯度函数之后,各数据持有方可以将各密码算子对应的密码算子梯度函数学习框架中。例如,各数据持有方可以基于机器学习框架提供的梯度函数注册接口将优化器程序注册到机器学习框架中。各数据持有方还可以将各密码算子与各密码算子对应的密码算子梯度函数进行关联。通过将密码算子与对应的梯度函数关联起来,可以使得机器学习框架在进行自动求导时能够根据密码算子找到与之匹配的密码算子梯度函数。
在本说明书一些实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,可以包括:获取安装包文件,其中,安装包文件是基于二进制文件构建的,二进制文件由用于实现多个密码算子的源码、用于实现优化器程序的源码、用于注册多个密码算子的源码和用于注册优化器程序的源码编译后得到,优化器程序在执行时用于将明文模型中的明文算子替换为明文算子对应的密码算子;对安装包文件进行安装;将安装后的文件导入预设明文机器学习模型,以将预设明文机器学习模型中的明文算子替换为对应的密码算子。
可以通过安装包的形式来实现将预设明文机器学习模型中的明文算子替换为对应的密码算子。具体的,可以将用于实现多个密码算子的源码、用于注册多个密码算子的源码、用于实现优化器程序的源码和用于注册优化器程序的源码编译成二进制文件。之后,可以基于编译生成的二进制文件生成安装包文件。各数据持有方可以获取该安装包文件。例如,安装包文件可以存储在服务器中。各数据持有方可以向服务器发送下载请求,服务器响应于下载请求向各数据持有方发送安装包文件。在获得安装包文件之后,各数据持有方可以对该安装包文件进行安装,得到安装后的文件。其中,安装后的文件可以包括用于解释执行的代码以及可执行代码。之后,各数据持有方可以将安装后的文件导入明文机器学习模型,从而将预设明文机器学习模型中的明文算子替换为对应的密码算子。在一个实施方式,各数据持有方的用户可以通过在预设明文模型里增加一行代码(例如,import Rosetta, Rosetta为安装包的名称),即可将安装后的文件导入预设明文机器学习模型中,即可以将明文机器学习模型中的明文算子替换为对应的密码算子。
上述实施例中的方法,仅需在预设明文机器学习模型中增加一行代码即可得到明文机器学习模型对应的隐私机器学习模型,操作方便简单,无需增加其他的应用程序接口,也无需增加其他的隐私数据类型,易用性好。而且,通过上述方法可以极为方便地移植明文机器学习模型。相对于其他的隐私机器学习框架,如果要移植明文机器学习模型就必须要使用相应的隐私数据类型和隐私应用程序接口重写,而使用本方案只需要在明文模型的源码中增加一行导入代码即可支持隐私保护,故移植性极高。此外,通过上述方式可以很方便地对模型进行扩展。相对于其他的隐私机器学习框架,如果要扩展支持另外的密码算法,除了要实现另外的密码算子外,还需要增加对应的应用程序接口和隐私数据类型,而本方案仅需将明文算子替换为对应的密码算子即可,无需增加额外的应用程序接口或隐私数据类型。
在本说明书一些实施例中,将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,可以包括:基于预设明文机器学习模型对应的数据流图中的数据流,确定预设明文机器学习模型中需要替换的明文算子;将预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。
具体的,在将预设明文机器学习模型中的明文算子替换为对应的密码算子之前,可以先确定预设明文机器学习模型中需要替换的明文算子。可以基于预设明文机器学习模型对应的数据流图中的数据流,确定预设明文机器学习模型中需要替换的明文算子。其中,数据流图可以用于表征明文机器学习模型中的数据流动信息。例如,在TensorFlow机器学习框架中,数据流图为张量流图。张量流图中的节点在图中表示数学操作,图中的线则表示在节点间相互联系的多维数据数组,即张量。
在一个实施方式中,考虑到是为了保护各持有方中存储的隐私样本数据的隐私才将预设明文机器学习模型中的明文算子替换为明文算子对应的算子,因此可以将隐私样本数据流经的算子确定为要替换的明文算子。在另一个实施方式中,考虑到隐私样本数据是为了训练模型,以得到模型参数(也称为训练变量),因此,可以将训练变量流经的算子确定为要替换的明文算子。
在确定了要替换的明文算子之后,可以将预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。上述实施例中,基于预设明文机器学习模型对应的数据流图中的数据流,可以确定要替换的明文算子,从而完成替换。
为了实现隐私样本数据的隐私保护,密码算子可以为任何可在两个或多个数据持有方联合(或协同)进行机器学习模型训练及预测场景中,为各方输入数据提供隐私保护的密码算子。例如,在本说明书一些实施例中,密码算子可以包括安全多方计算算子(MPC OPs)、同态加密算子(HE OPs)、或零知识证明算子(ZK OPs)等。其中,安全多方计算算子可以包括混淆电路(Garbled Circuit)、不经意传输(Oblivious Transfer)和秘密共享(Secret Sharing)中的一种或多种。上述密码算子仅是示例性的,本说明书对具体采用何种密码算子不作限定,具体可根据需要进行选择。
相应地,步骤S202中所述的执行安全多方计算可以基于步骤S201中所替换的密码算子来执行。步骤S202中的安全多方计算可以是广义的安全多方计算,可以包括MPC、HE和ZK中的至少一种。例如,在将预设明文机器学习模型中的明文算子替换为MPC OPs的情况下,多个数据持有方在训练模型时执行MPC计算。在将预设明文机器学习模型中的明文算子替换为HE OPs的情况下,多个数据持有方在训练模型时执行HE计算。在将预设明文机器学习模型中的明文算子替换为ZK OPs的情况下,多个数据持有方在训练模型时执行ZK计算。
在本说明书一些实施例中,可以由C语言或C++语言实现密码算子。通过采用C语言或C++语言来实现密码算子,可以提高机器模型的性能。相比于其他隐私机器学习方案中的密码算法都是使用python实现,本方案中对密码算子都是使用C/C++实现,故性能高。当然,也可以采用其他语言来实现密码算子,例如Python语言等。
下面结合一个具体实施例对上述方法进行说明,然而,值得注意的是,该具体实施例仅是为了更好地说明本申请,并不构成对本申请的不当限定。
本具体实施例中以TensorFlow框架为例进行说明。在本具体实施例中,实现隐私保护的数据处理方法可以包括以下步骤。
步骤1,框架开发人员用C/C++语言编写在各种设备上实现TensorFlow框架的本地明文算子(TF Native OPs)对应的密码算子(Crypto OPs)的源码。其中,由于TensorFlow框架支持分布式数据处理,可以在多种设备上分布执行。其中,多种设备可以包括CPU(中央处理器,Central Processing Unit)、GPU(图形处理器,Graphics Processing Unit)或TPU(张量处理单元,Tensor Processing Unit)等。由于不同设备上实现密码算子的指令集不同,因此需要实现适用于不同设备的密码算子。
步骤2,框架开发人员用C/C++语言编写编写用于在TensorFlow框架中注册密码算子的源码。
步骤3,框架开发人员用C/C++语言编写编写用于实现密码算子对应的密码算子梯度函数(Crypto OP Gradient Functions)的源码。
步骤4,框架开发人员用C/C++语言编写编写用于在TensorFlow框架中注册密码算子梯度函数以及将密码算子与密码算子梯度函数进行关联的源码。将相对应的密码算子与密码算子梯度函数关联起来,可以使得Tensorflow框架在进行自动求导时能够根据密码算子自动找到与之匹配的密码算子梯度函数。
由于本具体实施例中的方法中的密码算子都是使用C/C++实现,因此相对于其他的隐私机器学习框架中的密码算法使用python实现,本方案的性能更好。
此外,若其他的隐私机器学习框架要扩展支持另外的密码算法,如:ZK(零知识密码算法),则需要对框架进行扩展,除了要实现零知识密码算子(ZK Ops)外,还需增加对应的隐私应用程序接口和隐私数据类型。然而,本方案只要增加ZK Ops即可,无需修改框架或者增加其他的隐私数据类型,故本方案的扩展性好。
步骤5,框架开发人员用C/C++语言编写编写用于实现将明文机器学习模型对应的TensorFlow Graph(张量流图)中的本地明文算子替换为对应的密码算子的优化器程序。其中,优化器程序可以包括静态优化器程序(Static Pass)或动态优化器程序(Dynamic Pass)或两者。其中,静态优化器程序所实现的功能是在TensorFlow Graph执行之前进行替换。动态优化器程序所实现的功能是在TensorFlow Graph执行时进行替换。将静态优化器程序和动态优化器程序结合起来,可以提高效率和灵活性。
相对于其他的隐私机器学习框架,本具体实施例中的方法不增加其他的应用程序接口,也未增加其他的隐私数据类型,只需在TensorFlow框架的底层自动使用密码算子替换框架本地明文算子。故对用户来说,采用本方案,使用隐私机器学习算法与使用明文机器学习在算法编码上完成一致。因此,相比于已有隐私框架实现的方案,本方案的易用性更高。
步骤6,框架开发人员用C/C++语言编写编写用于在TensorFlow框架中注册静态优化器程序和/或动态优化器程序的源码。
步骤7,基于编写完成的实现密码算子的源码、注册密码算子的源码、实现密码算子梯度函数的源码、注册密码算子梯度函数的源码、实现优化器程序的源码和注册优化器程序的源码,构建二进制文件,例如生成Rosetta.so或Rosetta.dll。
步骤8,基于得到的二进制文件,制作python pip安装包,如Rosetta.whl。
步骤9,将生成的安装包发送至多个数据持有方中各数据持有方。
步骤10,各数据持有方对获得的安装包进行安装,并在各数据持有方中的明文机器学 习模型中增加一行代码import Rosetta,即可将安装后的文件导入明文机器学习模型,得到对应的隐私机器学习模型。
相对于其他的隐私机器学习框架,使用本方案可以极为方便地移植明文模型。对于其他的隐私机器学习框架,如果要移植明文模型就必须要使用相应的隐私数据类型和隐私应用程序接口重写。然而,使用本方案只需要在明文模型的源码中增加一行import Rosetta即可支持隐私方案,故移植性极高。
步骤11,多个数据持有方执行安全多方计算,以基于各数据持有方中存储的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型。目标机器学习模型可以用于后续待预测数据的预测。
本方案中通过由多个数据持有方执行安全多方计算,即可基于各自的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型,因而可以有效保护各个数据持有方的隐私样本数据的隐私。
请参考图3,示出了本说明书一个实施例中提供的实现隐私保护的数据处理方法的框图。如图3所示,与TensorFlow框架结合实现隐私机器学习模型的训练与预测,框图中主要包括以下几个模块:Rosetta Static Pass(静态优化器模块)、Rosetta Dynamic Pass(动态优化器模块)和Rosetta Crypto Ops(密码算子模块)。图3中所示的框图中除了上面的Rosetta Static Pass、Rosetta Dynamic Pass和Rosetta Crypto Ops是新增模块外,其他模块均是复用Tensorflow框架中的模块,在此不做赘述。
Rosetta Static Pass是基于python编程接口的静态图优化模块,负责在TensorFlow图执行之前把TensorFlow图中对应的TF Native OPs替换为对应的密码算子(Crypto OPs)。这些Crypto OPs可以包含安全多方计算算子(MPC Ops)、同态加密算子(HE Ops)、零知识证明算子(ZK Ops)等加密算子。至于替换为何种密码算子取决于用户的配置,如果用户配置为安全多方计算,那么就替换TF Native OPs为MPC Ops。同理,如果用户配置为配置为HE或ZK,就使用HE Ops或ZK Ops替换TF Native Ops。可以全部使用C/C++实现。
Rosetta Dynamic Pass是基于C/C++编程接口的动态图优化模块,负责在TensorFlow图执行时把TensorFlow图中对应的TF Native OPs替换为对应的Crypto OPs。可以全部使用C/C++实现。
Rosetta Crypto Ops是用于实现各种Crypto Ops的模块,可以包含MPC Ops、HE Ops、ZK Ops等加密算子,这些Crypto Ops和TF Native Ops对应,可以全部使用C/C++实现。
基于同一发明构思,本说明书实施例中还提供了一种实现隐私保护的数据处理装置,该装置位于多个数据持有方中各数据持有方中,多个数据持有方中各数据持有方中存储有各自的隐私样本数据,如下面的实施例所述。由于实现隐私保护的数据处理装置解决问题的原理与实现隐私保护的数据处理方法相似,因此实现隐私保护的数据处理装置的实施可以参见实现隐私保护的数据处理方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。图4是本说明书实施例的实现隐私保护的数据处理装置的一种结构框图,如图4所示,包括:替换模块401和训练模块402,下面对该结构进行说明。
替换模块401用于将预设明文机器学习模型中的明文算子替换为明文算子对应的密码算子,得到预设明文机器学习模型对应的隐私机器学习模型。
训练模块402用于执行安全多方计算,以基于各数据持有方中存储的各自的隐私样本数据对隐私机器学习模型进行联合训练,输出目标机器学习模型。
在本说明书一些实施例中,替换模块可以用于:获取多个密码算子,并对多个密码算子中各密码算子进行注册;获取优化器程序,并对优化器程序进行注册,其中,优化器程序用于将预设明文机器学习模型中的明文算子替换为对应的密码算子。
在本说明书一些实施例中,优化器程序还用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数;相应地,替换模块还可以用于:在获取多个密码算子,并对多个密码算子中各密码算子进行注册之后,获取多个密码算子中各密码算子对应的密码算子梯度函数,并对各密码算子对应的密码算子梯度函数进行注册;将经注册的密码算子梯度函数与对应的经注册的密码算子进行关联。
在本说明书一些实施例中,替换模块可以用于:获取安装包文件,其中,安装包文件是基于二进制文件构建的,二进制文件由用于实现多个密码算子的源码、用于实现优化器程序的源码、用于注册多个密码算子的源码和用于注册优化器程序的源码编译后得到,优化器程序在执行时用于将明文模型中的明文算子替换为明文算子对应的密码算子;对安装包文件进行安装;将安装后的文件导入预设明文机器学习模型,以将预设明文机器学习模型中的明文算子替换为对应的密码算子。
在本说明书一些实施例中,替换模块可以用于:基于预设明文机器学习模型对应的数据流图中的数据流,确定预设明文机器学习模型中需要替换的明文算子;将预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。
在本说明书一些实施例中,优化器程序包括以下至少之一:静态优化器程序和动态优化器程序。
在本说明书一些实施例中,密码算子包括以下至少之一:安全多方计算算子、同态加密算子和零知识证明算子。
从以上的描述中,可以看出,本说明书实施例实现了如下技术效果:仅需将预设明文机器学习模型中的明文算子替换为对应的密码算子,即可得到对应的隐私机器学习模型,之后多个数据持有方执行安全多方计算,即可基于各自的隐私样本数据对隐私机器学习模型进行联合训练,得到目标机器学习模型,因而可以有效保护各个数据持有方的隐私样本数据的隐私。此外,上述方案无需增加其他的应用程序接口,也无需增加其他的隐私数据类型,仅需将明文机器学习模型中的明文算子替换为对应的密码算子,操作方便简单,易用性强。
本说明书实施方式还提供了一种计算机设备,具体可以参阅图5所示的基于本说明书实施例提供的实现隐私保护的数据处理方法的计算机设备组成结构示意图,所述计算机设备具体可以包括输入设备51、处理器52、存储器53。其中,所述存储器53用于存储处理器可执行指令。所述处理器52执行所述指令时实现上述任意实施例中所述的实现隐私保护的数据处理方法的步骤。
在本实施方式中,所述输入设备具体可以是用户和计算机系统之间进行信息交换的主要装置之一。所述输入设备可以包括键盘、鼠标、摄像头、扫描仪、光笔、手写输入板、语音输入装置等;输入设备用于把原始数据和处理这些数的程序输入到计算机中。所述输入设备还可以获取接收其他模块、单元、设备传输过来的数据。所述处理器可以按任何适当的方式实现。例如,处理器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式等等。所述存储器具体可以是现代信息技术中用于保存信息的记忆设备。所述存储器可以包括多个层次,在数字系统中,只要能保存二进制数据的都可以是存储器;在集成电路中,一个没有实物形式的具有存储功能的电路也叫存储器,如RAM、FIFO等;在系统中,具有实物形式的存储设备也叫存储器,如内存条、TF卡等。
在本实施方式中,该计算机设备具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
本说明书实施方式中还提供了一种基于实现隐私保护的数据处理方法的计算机存储 介质,所述计算机存储介质存储有计算机程序指令,在所述计算机程序指令被执行时实现上述任意实施例中所述实现隐私保护的数据处理方法的步骤。
在本实施方式中,上述存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、缓存(Cache)、硬盘(Hard Disk Drive,HDD)或者存储卡(Memory Card)。所述存储器可以用于存储计算机程序指令。网络通信单元可以是依照通信协议规定的标准设置的,用于进行网络连接通信的接口。
在本实施方式中,该计算机存储介质存储的程序指令具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
显然,本领域的技术人员应该明白,上述的本说明书实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本说明书实施例不限制于任何特定的硬件和软件结合。
应该理解,以上描述是为了进行图示说明而不是为了进行限制。通过阅读上述描述,在所提供的示例之外的许多实施方式和许多应用对本领域技术人员来说都将是显而易见的。因此,本申请的范围不应该参照上述描述来确定,而是应该参照前述权利要求以及这些权利要求所拥有的等价物的全部范围来确定。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本说明书实施例可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种实现隐私保护的数据处理方法,其特征在于,由多个数据持有方执行,所述多个数据持有方中各数据持有方中存储有各自的隐私样本数据,所述方法包括:
    将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,得到所述预设明文机器学习模型对应的隐私机器学习模型;
    执行安全多方计算,以基于所述各数据持有方中存储的各自的隐私样本数据对所述隐私机器学习模型进行联合训练,输出目标机器学习模型。
  2. 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:
    获取多个密码算子,并对所述多个密码算子中各密码算子进行注册;
    获取优化器程序,并对所述优化器程序进行注册,其中,所述优化器程序用于将预设明文机器学习模型中的明文算子替换为对应的密码算子。
  3. 根据权利要求2所述的方法,其特征在于,所述优化器程序还用于将预设明文机器学习模型中的明文算子梯度函数替换为对应的密码算子梯度函数;
    相应地,在获取多个密码算子,并对所述多个密码算子中各密码算子进行注册之后,还包括:
    获取所述多个密码算子中各密码算子对应的密码算子梯度函数,并对所述各密码算子对应的密码算子梯度函数进行注册;
    将经注册的密码算子梯度函数与对应的经注册的密码算子进行关联。
  4. 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:
    获取安装包文件,其中,所述安装包文件是基于二进制文件构建的,所述二进制文件由用于实现多个密码算子的源码、用于实现优化器程序的源码、用于注册所述多个密码算子的源码和用于注册所述优化器程序的源码编译后得到,所述优化器程序在执行时用于将所述明文模型中的明文算子替换为所述明文算子对应的密码算子;
    对所述安装包文件进行安装;
    将安装后的文件导入所述预设明文机器学习模型,以将所述预设明文机器学习模型中的明文算子替换为对应的密码算子。
  5. 根据权利要求1所述的方法,其特征在于,将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,包括:
    基于预设明文机器学习模型对应的数据流图中的数据流,确定所述预设明文机器学习模型中需要替换的明文算子;
    将所述预设明文机器学习模型中需要替换的明文算子替换为对应的密码算子。
  6. 根据权利要求2所述的方法,其特征在于,所述优化器程序包括以下至少之一:静态优化器程序和动态优化器程序。
  7. 根据权利要求2所述的方法,其特征在于,所述密码算子包括以下至少之一:安全多方计算算子、同态加密算子和零知识证明算子。
  8. 一种实现隐私保护的数据处理装置,其特征在于,位于多个数据持有方中各数据持有方中,所述多个数据持有方中各数据持有方中存储有各自的隐私样本数据,所述装置包括:
    替换模块,用于将预设明文机器学习模型中的明文算子替换为所述明文算子对应的密码算子,得到所述预设明文机器学习模型对应的隐私机器学习模型;
    训练模块,用于执行安全多方计算,以基于所述各数据持有方中存储的各自的隐私样本数据对所述隐私机器学习模型进行联合训练,输出目标机器学习模型。
  9. 一种计算机设备,其特征在于,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现权利要求1至7中任一项所述方法的步骤。
  10. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述指令被执行时实现权利要求1至7中任一项所述方法的步骤。
PCT/CN2020/080392 2020-03-20 2020-03-20 实现隐私保护的数据处理方法和装置 WO2021184347A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080392 WO2021184347A1 (zh) 2020-03-20 2020-03-20 实现隐私保护的数据处理方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080392 WO2021184347A1 (zh) 2020-03-20 2020-03-20 实现隐私保护的数据处理方法和装置

Publications (1)

Publication Number Publication Date
WO2021184347A1 true WO2021184347A1 (zh) 2021-09-23

Family

ID=77769609

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080392 WO2021184347A1 (zh) 2020-03-20 2020-03-20 实现隐私保护的数据处理方法和装置

Country Status (1)

Country Link
WO (1) WO2021184347A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692207A (zh) * 2022-05-31 2022-07-01 蓝象智联(杭州)科技有限公司 一种隐私保护的数据处理方法、装置及存储介质
CN115982779A (zh) * 2023-03-17 2023-04-18 北京富算科技有限公司 一种数据匿名化方法、装置、电子设备及存储介质
CN116383865A (zh) * 2022-12-30 2023-07-04 上海零数众合信息科技有限公司 联邦学习预测阶段隐私保护方法及系统
CN116595569A (zh) * 2023-07-19 2023-08-15 西南石油大学 一种基于联盟链的政务数据安全多方计算方法
WO2023231939A1 (zh) * 2022-06-01 2023-12-07 维沃移动通信有限公司 业务处理方法、装置、网络设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325584A (zh) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 基于神经网络的联邦建模方法、设备及可读存储介质
CN110619220A (zh) * 2019-08-09 2019-12-27 北京小米移动软件有限公司 对神经网络模型加密的方法及装置、存储介质
CN110795768A (zh) * 2020-01-06 2020-02-14 支付宝(杭州)信息技术有限公司 基于私有数据保护的模型学习方法、装置及系统
CN110851482A (zh) * 2019-11-07 2020-02-28 支付宝(杭州)信息技术有限公司 为多个数据方提供数据模型的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325584A (zh) * 2018-08-10 2019-02-12 深圳前海微众银行股份有限公司 基于神经网络的联邦建模方法、设备及可读存储介质
CN110619220A (zh) * 2019-08-09 2019-12-27 北京小米移动软件有限公司 对神经网络模型加密的方法及装置、存储介质
CN110851482A (zh) * 2019-11-07 2020-02-28 支付宝(杭州)信息技术有限公司 为多个数据方提供数据模型的方法及装置
CN110795768A (zh) * 2020-01-06 2020-02-14 支付宝(杭州)信息技术有限公司 基于私有数据保护的模型学习方法、装置及系统

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692207A (zh) * 2022-05-31 2022-07-01 蓝象智联(杭州)科技有限公司 一种隐私保护的数据处理方法、装置及存储介质
WO2023231939A1 (zh) * 2022-06-01 2023-12-07 维沃移动通信有限公司 业务处理方法、装置、网络设备及存储介质
CN116383865A (zh) * 2022-12-30 2023-07-04 上海零数众合信息科技有限公司 联邦学习预测阶段隐私保护方法及系统
CN116383865B (zh) * 2022-12-30 2023-10-10 上海零数众合信息科技有限公司 联邦学习预测阶段隐私保护方法及系统
CN115982779A (zh) * 2023-03-17 2023-04-18 北京富算科技有限公司 一种数据匿名化方法、装置、电子设备及存储介质
CN116595569A (zh) * 2023-07-19 2023-08-15 西南石油大学 一种基于联盟链的政务数据安全多方计算方法
CN116595569B (zh) * 2023-07-19 2023-09-15 西南石油大学 一种基于联盟链的政务数据安全多方计算方法

Similar Documents

Publication Publication Date Title
WO2021184347A1 (zh) 实现隐私保护的数据处理方法和装置
Hu et al. FDML: A collaborative machine learning framework for distributed features
RU2738826C1 (ru) Параллельное выполнение транзакций в сети блокчейнов
RU2731417C1 (ru) Параллельное выполнение транзакций в сети цепочек блоков на основе белых списков смарт-контрактов
CN111414646B (zh) 实现隐私保护的数据处理方法和装置
Alamri et al. Blockchain for Internet of Things (IoT) research issues challenges & future directions: A review
JP6892513B2 (ja) 信頼できる実行環境に基づいたオフチェーンスマートコントラクトサービス
Ohrimenko et al. Oblivious {Multi-Party} machine learning on trusted processors
US10467389B2 (en) Secret shared random access machine
RU2744827C2 (ru) Белые списки смарт-контрактов
WO2018113642A1 (zh) 一种面向远程计算的控制流隐藏方法及系统
CN113408746A (zh) 一种基于区块链的分布式联邦学习方法、装置及终端设备
CN110969264B (zh) 模型训练方法、分布式预测方法及其系统
Law et al. Secure collaborative training and inference for xgboost
US11409653B2 (en) Method for AI model transferring with address randomization
Chen et al. Developing privacy-preserving AI systems: The lessons learned
Leung et al. Towards privacy-preserving collaborative gradient boosted decision trees
Dolev et al. Secret shared random access machine
US12015691B2 (en) Security as a service for machine learning
Rahaman et al. Secure Multi-Party Computation (SMPC) Protocols and Privacy
Siedlecka-Lamch et al. A fast method for security protocols verification
US11657332B2 (en) Method for AI model transferring with layer randomization
WO2020211075A1 (zh) 去中心化多方安全数据处理方法、装置及存储介质
Hu et al. Stochastic distributed optimization for machine learning from decentralized features
EP4295222A1 (en) Secure collaborative processing of private inputs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925420

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20925420

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/11/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20925420

Country of ref document: EP

Kind code of ref document: A1