CN111783124A - Data processing method and device based on privacy protection and server - Google Patents

Data processing method and device based on privacy protection and server Download PDF

Info

Publication number
CN111783124A
CN111783124A CN202010644238.3A CN202010644238A CN111783124A CN 111783124 A CN111783124 A CN 111783124A CN 202010644238 A CN202010644238 A CN 202010644238A CN 111783124 A CN111783124 A CN 111783124A
Authority
CN
China
Prior art keywords
privacy
target
operator
machine learning
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010644238.3A
Other languages
Chinese (zh)
Inventor
陈元丰
谢翔
晏意林
黄高峰
史俊杰
李升林
孙立林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juzix Technology Shenzhen Co ltd
Original Assignee
Juzix Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Juzix Technology Shenzhen Co ltd filed Critical Juzix Technology Shenzhen Co ltd
Priority to CN202010644238.3A priority Critical patent/CN111783124A/en
Publication of CN111783124A publication Critical patent/CN111783124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application provides a data processing method, a data processing device and a server based on privacy protection, wherein the method comprises the following steps: acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm for the initial model as a target privacy algorithm; converting the target tensor data in the initial model into a custom tensor type matched with a target privacy algorithm according to a preset conversion rule; and replacing the target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule so as to generate a privacy machine learning model matched with the target privacy algorithm. By utilizing the privacy machine learning model, the source code of the existing preset machine learning frame does not need to be modified subsequently, and the privacy machine learning model can be directly subjected to corresponding privacy machine learning through the existing preset machine learning frame, so that the data processing cost is reduced.

Description

Data processing method and device based on privacy protection and server
Technical Field
The present application relates to the field of data encryption technologies, and in particular, to a data processing method and apparatus based on privacy protection, and a server.
Background
Based on the existing data processing method, the existing privacy machine learning framework is often only used for privacy machine learning under the privacy protection based on multi-party security calculation. If the privacy machine learning based on other privacy algorithms (for example, homomorphic encryption algorithm and the like) is to be performed, a great deal of cost is consumed to additionally modify the existing machine learning framework adapted to the back end, otherwise, the functional mechanism in the existing machine learning framework cannot be normally utilized.
Therefore, when the existing data processing method is implemented specifically, the existing privacy machine learning framework cannot be directly utilized to perform privacy machine learning based on other privacy algorithms except for secure multi-party calculation, and the technical problems of poor usability and high processing cost often exist.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device and a server based on privacy protection, and aims to solve the technical problems that the existing privacy machine learning framework cannot be directly utilized to carry out privacy machine learning based on other privacy algorithms except for safe multi-party calculation, the usability is poor, and the processing cost is high in the existing data processing method.
The embodiment of the application provides a data processing method based on privacy protection, which comprises the following steps:
acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
In one embodiment, the target privacy algorithm comprises at least one of: a secure multi-party calculation algorithm, a homomorphic encryption algorithm and a zero-knowledge proof algorithm.
In one embodiment, converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule includes:
determining, from the initial model, native tensor data related to privacy sample data in model training and/or native tensor data related to training variable data in model training as target tensor data;
and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
In one embodiment, converting the target tensor data into tensor data of a custom tensor type matched with the target privacy algorithm according to a preset conversion rule includes:
determining a first class conversion operator matched with a target privacy algorithm according to the target privacy algorithm; the first-class conversion operator is used for converting the target tensor data into a custom tensor type matched with a target privacy algorithm;
inserting said first class of conversion operators at the input of an operator associated with the target tensor data in said initial model.
In one embodiment, the method further comprises:
determining a second class of conversion operators matched with the target privacy algorithm according to the target privacy algorithm; the second type of conversion operator is used for restoring the target tensor data of the user-defined tensor type into a native tensor type;
inserting the second class of conversion operators at the output of an operator associated with the target tensor data in the initial model.
In one embodiment, replacing the target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule includes:
determining a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows;
and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
In one embodiment, replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule includes:
determining a target privacy operator matched with the target privacy algorithm;
determining a corresponding target privacy gradient operator according to the target privacy operator;
after the target privacy operator and the target privacy gradient operator are associated, registering the associated target privacy operator and the target privacy gradient operator in a preset machine learning frame matched with the initial model;
replacing the target operator in the initial model with the target privacy operator by executing a replacement algorithm.
In one embodiment, after generating the privacy machine learning model, the method further comprises:
and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
In one embodiment, the pre-set machine learning framework comprises at least one of: tensorflow, PyTorch, MXNet.
In one embodiment, after obtaining the target model, the method further comprises: and carrying out data processing on target data in a target service scene by using the target model.
An embodiment of the present application further provides a data processing apparatus based on privacy protection, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plaintext machine learning model to be trained as an initial model and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
the processing module is used for converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
In one embodiment, the target privacy algorithm comprises at least one of: a secure multi-party calculation algorithm, a homomorphic encryption algorithm and a zero-knowledge proof algorithm.
In one embodiment, the processing module includes a tensor type conversion unit, configured to determine, from the initial model, native tensor data related to privacy sample data in model training and/or native tensor data related to training variable data in model training as target tensor data; and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
In one embodiment, the processing module includes a privacy operator replacing unit, configured to determine a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows; and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
In one embodiment, the apparatus further includes a training module configured to train the privacy machine learning model using a preset machine learning framework matched with the initial model to obtain a target model meeting requirements.
The embodiment of the application also provides a server, which comprises a processor and a memory for storing processor executable instructions, wherein the processor is used for acquiring a plaintext machine learning model to be trained as an initial model when executing the instructions, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm; converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
The embodiment of the application also provides a computer readable storage medium, which stores computer instructions, and when the instructions are executed, the instructions realize that a plaintext machine learning model to be trained is obtained as an initial model, and a privacy encryption algorithm aiming at the initial model is determined as a target privacy algorithm; converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
The embodiment of the application further provides a data processing method based on privacy protection, which comprises the following steps:
acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model;
and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
In the embodiment of the application, target tensor data in a plaintext machine learning model to be trained are converted into a custom tensor type matched with a target privacy algorithm according to a preset conversion rule; and meanwhile, replacing a target operator in the plaintext machine learning model to be trained with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule so as to generate a privacy machine learning model matched with the target privacy algorithm. By using the privacy machine learning model, the existing machine learning frame does not need to be modified subsequently, and the existing machine learning frame can be directly used for privacy machine learning based on various privacy algorithms, so that the data processing cost is reduced. The method solves the technical problems that the existing privacy machine learning framework can not be directly utilized to carry out privacy machine learning based on other privacy algorithms except for safe multi-party calculation, the usability is poor and the processing cost is high in the existing data processing method.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a process flow diagram of a data processing method based on privacy protection according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an embodiment of a data processing method based on privacy protection provided by an embodiment of the present application in an example scenario;
FIG. 3 is a schematic diagram of an embodiment of a data processing method based on privacy protection provided by an embodiment of the present application in an example scenario;
FIG. 4 is a process flow diagram of a data processing method based on privacy protection according to an embodiment of the present application;
FIG. 5 is a block diagram of a data processing apparatus based on privacy protection according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a component structure of a server provided according to an embodiment of the present application;
fig. 7 is a schematic diagram of an embodiment of applying the data processing method and apparatus based on privacy protection provided by the embodiment of the present application in a scenario example.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Considering that most of the backend of the existing privacy machine learning framework (such as TF-Encrypted, PySyft, cryptten, etc.) is adapted by using the existing mainstream AI (machine learning) framework (such as TensorFlow, etc.), and the maximum value bit width of the mainstream AI framework is 64bits, which is the same as the value bit width used by the unit element in most of the secure multi-party computing (MPC) algorithms, the tensor data (such as tensor) in the model can flow in the frontend of the existing privacy machine learning framework and the mainstream AI framework of the backend. Some functional mechanisms such as automatic derivation, execution engine, etc. in the main stream AI framework of the backend can also be used normally. Therefore, based on the existing data processing method, the existing privacy machine learning framework can be directly utilized to carry out the privacy machine learning based on the safe multi-party calculation.
However, other privacy (encryption) algorithms besides secure multiparty computation, such as Homomorphic Encryption (HE) algorithm, zero-knowledge proof algorithm, etc., since the bit width of the unit element of these privacy algorithms is usually larger than 64bits used by mainstream AI framework. For example, the unit elements of a homomorphic encryption algorithm may even use values that are more than 1000bits wide. This makes it impossible to use the type of the native tensor in the mainstream AI framework to support the above-mentioned privacy algorithms other than secure multiparty computation. The tensor data generated by the privacy algorithms except the secure multi-party calculation cannot normally flow in the main stream AI frame at the back end, and the functional mechanism in the main stream AI frame cannot be normally used for model learning and training.
Based on the existing data processing method, for the above situation, if other privacy algorithms except for secure multi-party computation are to be used, a large amount of processing time and energy are required to be consumed, and in addition, the source code of the mainstream AI framework adapted at the back end of the existing privacy machine learning framework is modified, so that the AI framework can support the privacy machine learning of other privacy algorithms except for secure multi-party computation, which inevitably increases the data processing cost.
Therefore, when the existing data processing method is implemented, the existing privacy machine learning framework cannot be directly utilized to perform privacy machine learning based on other privacy algorithms except for secure multi-party calculation, and the technical problems of poor usability and high processing cost often exist.
Aiming at the root cause of the technical problem, the method can firstly convert the target tensor data in the plaintext machine learning model to be trained into the custom tensor type matched with the target privacy algorithm according to a preset conversion rule; meanwhile, according to a preset replacement rule, replacing a target operator in the plaintext machine learning model to be trained with a target privacy operator matched with a target privacy algorithm, so that the privacy machine learning model matched with the target privacy algorithm (including other privacy algorithms except for safe multiparty calculation) and supporting the existing machine learning framework can be obtained. And then, the existing preset machine learning frame and the privacy machine learning model can be directly utilized subsequently to carry out privacy machine learning based on the target privacy algorithm. Therefore, the data processing cost can be reduced, and the technical problems that the existing privacy machine learning framework cannot be directly utilized to carry out privacy machine learning based on other privacy algorithms except for safe multi-party calculation, the usability is poor and the processing cost is high in the existing data processing method are solved.
Based on the thought, the embodiment of the application provides a data processing method based on privacy protection. Specifically, please refer to fig. 1. The data processing method based on privacy protection provided by the embodiment of the present application may include the following contents in specific implementation.
S101: the method comprises the steps of obtaining a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm.
In one embodiment, the plaintext machine learning model is specifically understood as an initial model to be subjected to privacy-preserving machine learning.
In an embodiment, the privacy-preserving machine learning may be specifically understood as a machine learning manner that, on the premise of preserving data privacy of parameter data (e.g., sample data used in model training or variable data generated in the model training process) related to a model in a model training process and avoiding leakage of the parameter data, an initial model is trained and learned to obtain a target model that meets requirements and can be used for data processing in a target application scenario.
Specifically, for example, the data provider a has a first type of sample data, and the data provider B has a second type of sample data. The current data provider a and the data provider B require that a target model is trained together by using sample data owned by each, and also require that the other party is prevented from knowing the sample data owned by the other party, and the data privacy of the sample data owned by the other party is protected. In this case, on the premise of protecting the data privacy of the sample data owned by the two parties by the above-mentioned privacy protection machine learning method, the sample data owned by the two parties can be comprehensively utilized to jointly train and obtain the corresponding target model.
In an embodiment, the obtaining of the plaintext machine learning model to be trained as the initial model may include: and receiving a data flow graph (or called a computational graph, a logic graph and the like) corresponding to the plaintext machine learning model to be trained as the initial model.
In one embodiment, the plaintext machine learning model may be a model generated based on an existing plaintext machine learning framework.
In one embodiment, the target privacy algorithm may be specifically understood as a privacy encryption algorithm used in privacy-preserving machine learning of the initial model.
In one embodiment, the target privacy algorithm may be a secure multi-party computing (MPC) algorithm, or may be other privacy algorithms besides the secure multi-party computing algorithm, for example, a Homomorphic Encryption (HE) algorithm, a zero-knowledge proof algorithm, and the like. Of course, it should be noted that the above listed privacy algorithm is only an exemplary illustration. In particular, other types of privacy algorithms than those listed above may be included, depending on the particular situation and processing needs. The present specification is not limited to these.
In one embodiment, in specific implementation, a selection instruction of a user for a privacy algorithm may be received, and the privacy algorithm selected by the user is determined as a target privacy algorithm according to the selection instruction.
For example, a privacy algorithm setting interface can be displayed to the user, and the name or the number of the privacy algorithm input in the privacy algorithm setting interface by the user is received; and determining a corresponding privacy algorithm according to the name or the number, and determining the privacy algorithm as a target privacy algorithm.
In one embodiment, during specific implementation, the server may also actively acquire a privacy protection requirement of the user, and automatically match a suitable privacy algorithm for the user as a target privacy algorithm and the like according to the privacy protection requirement of the user and in combination with a specific situation and a use scenario of the user.
S102: converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
In one embodiment, the data may be generated by converting the associated native tensor data in the initial model into a custom tensor type that matches the target privacy algorithm; meanwhile, the related operators in the initial model are converted into privacy operators matched with the target privacy algorithm, and a privacy machine learning model which can be matched with the target privacy algorithm and can be supported by the existing preset machine learning framework adapted to the rear end is obtained. Therefore, the existing preset privacy machine learning frame can be directly utilized subsequently to carry out privacy machine learning based on the target privacy algorithm on the privacy machine learning model.
In an embodiment, the converting, according to a preset conversion rule, the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm may include the following steps: determining, from the initial model, native tensor data related to privacy sample data in model training and/or native tensor data related to training variable data in model training as target tensor data; and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
In this embodiment, it is considered that in most application scenarios, a user often wants to protect sample data used during model training and/or data privacy of training variable data in a model training process, so as to prevent the data from being leaked. Thus, the native tensor data in the initial model, which is related to the privacy sample data in the model training, and/or the native tensor data, which is related to the training variable data in the model training, may be defaulted as the target tensor data.
Certainly, in specific implementation, the user can additionally set other data to be protected according to the own requirements. Accordingly, the original tensor data related to the other data needing protection set by the user in the initial model can be determined as the target tensor data.
In one embodiment, the above-mentioned custom tensor type matched with the target privacy algorithm may specifically include a custom tensor type having a same numerical bit width as a numerical bit width used by a unit element of the target privacy algorithm. The custom tensor type may specifically include a string type. Of course, the above-mentioned custom type may also include other custom tensor types, as the case may be.
In one embodiment, the native tensor type of the target tensor data in the initial model is converted into the custom tensor type matched with the target privacy algorithm in the above manner, so that the target tensor data in the model can support the corresponding target privacy algorithm.
In an embodiment, the converting, according to a preset conversion rule, the target tensor data into tensor data of a custom tensor type matched with the target privacy algorithm may include: determining a first class conversion operator matched with a target privacy algorithm according to the target privacy algorithm; the first-class conversion operator is used for converting the target tensor data into a custom tensor type matched with a target privacy algorithm; inserting said first class of conversion operators at the input of an operator associated with the target tensor data in said initial model.
In one embodiment, the above-mentioned first class of conversion operator can be specifically understood as a custom tensor conversion operator (which may be denoted as Encode OP or Encode), and the custom tensor conversion operator can be used for converting the input native tensor data into tensor data of a custom tensor type matched with the target privacy algorithm as output. The above-mentioned operator related to the target tensor data may specifically comprise an operator through which the target tensor data flows,
specifically, for example, as shown in fig. 2, the native tensor data X, Y, X1 and Y1 in the initial model are target tensor data; the X, Y flows through the Mul1, and the X1 and Y1 flow through the Mul 2. Therefore, the above-described operators Mul1, Mul2 can be used as operators related to the target tensor data. Further, corresponding first class conversion operators may be inserted at input end positions of Mul1 and Mul2, respectively. Thus, the target tensor data X, Y is converted into tensor data of a custom tensor type matched with the target privacy algorithm by a first type of conversion operator (for example, Encode) before the inflow operator Mul1, or the target tensor data X1 and Y1 are converted before the inflow operator Mul2, and then the tensor data flows into the corresponding following operator Mul1 or Mul2 for processing.
It should be noted that the above-mentioned custom tensor type matched with the target privacy algorithm supports the privacy operator based on the target privacy algorithm, and also supports the function mechanism using the existing machine learning framework.
In one embodiment, while the target tensor data in the initial model is converted into the tensor data of the custom tensor type matched with the target privacy algorithm, the target operator in the initial model can be replaced by the target privacy operator matched with the target privacy algorithm, so that the original plaintext machine learning model can be converted into the corresponding privacy machine learning model.
In an embodiment, the replacing, according to a preset replacement rule, the target operator in the initial model with the target privacy operator matched with the target privacy algorithm may include the following steps: determining a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows; and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
In this embodiment, the target operator may specifically be an operator through which privacy sample data flows, or may be an operator through which training variable data flows. The operator which needs privacy encryption and the like can be specified by the user according to the specific situation of the user.
In this embodiment, the target privacy operator may be specifically understood as a target operator that is subjected to privacy encryption processing by a target privacy algorithm.
In one embodiment, replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule includes: determining a target privacy operator matched with the target privacy algorithm; determining a corresponding target privacy gradient operator according to the target privacy operator; after the target privacy operator and the target privacy gradient operator are associated, registering the associated target privacy operator and the target privacy gradient operator in a preset machine learning frame matched with the initial model; and replacing the target operator in the initial model with the target privacy operator by executing a replacement algorithm.
In an embodiment, the determining the target privacy operator matched with the target privacy algorithm may be: and performing corresponding processing on the target operator by using a target privacy algorithm, and generating a corresponding target privacy operator in an executable file (such as a dll file).
In an embodiment, the determining the target privacy operator matched with the target privacy algorithm may further be: before specific implementation, preset privacy operators respectively matched with various privacy algorithms can be configured for each operator in advance, and the preset privacy operators are stored in a database; in specific implementation, a database can be retrieved according to a target operator and a target privacy algorithm, and a preset privacy operator matched with the target privacy algorithm and the target operator is retrieved from a plurality of preset privacy operators and used as the target privacy operator. Therefore, a corresponding target privacy operator does not need to be generated temporarily in a reproduction field, so that repetitive labor is reduced, and the overall processing efficiency is improved.
In one embodiment, after the target privacy operator is determined, a corresponding target privacy gradient operator may be further determined. During specific implementation, the gradient function can be solved according to the target privacy operator to obtain the corresponding target privacy gradient operator. And a target privacy gradient operator corresponding to the target privacy operator can be found by searching the database. The database can also store preset gradient privacy operators which are configured in advance and correspond to the preset privacy operators.
In an embodiment, during specific replacement, the target privacy operator and the corresponding target privacy gradient operator may be associated, and the associated target privacy operator and target privacy gradient operator are registered in a preset machine learning framework. Further, the target operators in the initial model may be replaced by corresponding target privacy operators by executing a replacement algorithm. The preset machine learning framework may be an existing plaintext machine learning framework matched with the initial model.
In particular, reference may be made to fig. 3, for example. The target privacy algorithm selected by the user is a homomorphic encryption algorithm, and a target operator in the initial model comprises: mul1, Mul2, and Add. During replacement, a target privacy operator based on a homomorphic encryption algorithm can be determined according to the target operator, and the target privacy operator comprises the following steps: HeMul1, HeMul2, and HeAdd, and corresponding target gradient privacy operators. And then, the target privacy operator and the target gradient privacy operator are associated and then registered in a preset machine learning frame matched with the initial model, and the original target operators Mul1, Mul2 and Add in the initial model are respectively and correspondingly replaced by HeMul1, HeMul2 and HeAdd through executing a replacement algorithm.
By the method, the target tensor data in the initial model are converted into the user-defined tensor type matched with the target privacy algorithm, and the target operator in the initial model is replaced by the target privacy operator matched with the target privacy algorithm, so that the privacy machine learning model corresponding to the plaintext machine learning model is obtained. The privacy machine learning model can support not only target privacy algorithms, but also functional mechanisms such as automatic derivation, execution engines and the like in the existing preset machine learning framework.
In an embodiment, the preset machine learning framework may specifically include at least one of: tensorflow, PyTorch, MXNet, and the like.
Of course, it should be noted that the preset machine learning framework listed above is only an exemplary illustration. In particular implementations, other types of machine learning frameworks besides those listed above may be included, depending on the particular circumstances and user preferences. The present specification is not limited to these.
In an embodiment, after the generating the privacy machine learning model, when the method is implemented, the following may be further included: and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
In one embodiment, during implementation, sample data may be obtained; the privacy machine learning model is executed through the preset machine learning framework, sample data are learned on the premise of protecting data privacy, and the model is trained to obtain a target model meeting requirements.
In the embodiment of the application, since the target tensor data in the private machine learning model is converted into the customized tensor type matched with the target privacy algorithm, the target privacy algorithm (including other privacy algorithms except for secure multiparty computation) can be supported, and meanwhile, the tensor data can also flow normally in a preset machine learning framework. And since the associated target privacy operator and target privacy gradient operator are registered in the preset machine learning framework, in the process of performing model training on the privacy machine learning model through the preset machine learning framework, functional mechanisms such as automatic derivation, execution engine and the like in the rear-end preset machine learning framework can be normally used. The user does not need to waste time and effort to modify the source codes of the preset machine learning frame additionally, the usability of the machine learning frame is improved, the data processing cost in the privacy machine learning process is reduced, the user can conveniently and efficiently utilize the existing machine learning frame to carry out privacy machine learning based on other privacy algorithms except for safe multi-party calculation, the user operation is simplified, and the user experience is improved.
In an embodiment, when the method is implemented, the following may be further included: determining a second class of conversion operators matched with the target privacy algorithm according to the target privacy algorithm; the second type of conversion operator is used for restoring the target tensor data of the user-defined tensor type into a native tensor type; inserting the second class of conversion operators at the output of an operator associated with the target tensor data in the initial model.
In this embodiment, the second type of conversion operator may be specifically understood as a custom tensor conversion operator (which may be referred to as Decode OP or Decode), and the custom tensor conversion operator may be configured to restore the input tensor data of the custom tensor type to the tensor data of the native tensor type as an output. The operator related to the target tensor data may specifically include an operator that streams data to be fed back to the user.
In the present embodiment, it is considered that the target tensor data in the model after conversion by the first-type conversion operator are all converted into the tensor data of the custom tensor type matching the target privacy algorithm. Although the tensor data of the user-defined tensor type can simultaneously support a target privacy algorithm and an existing preset machine learning framework. The above-described type of custom tensor is often not user-friendly. For example, a user may not be able to efficiently interpret and understand what the custom tensor-type data represents.
Therefore, the user can conveniently interpret and understand the relevant data in the model. In specific implementation, a second conversion operator can be inserted into the output end of the operator related to the target tensor data, and the data output by the operator is restored to the native tensor type from the customized tensor type, so that the reading and understanding of a user can be facilitated, and the use experience of the user is improved.
Specifically, for example, as shown in fig. 2, the above-mentioned second type conversion operator (e.g., Decode) may be inserted at the output end position of the operator _ ret _ from which the result data to be fed back to the user is streamed. Therefore, the result data of the custom type which flows out from the _ ret _ and is to be fed back to the user can be restored to the native tensor type which can be interpreted and understood by the user through the second type conversion operator and then fed back to the user.
In an embodiment, after obtaining the target model, when the method is implemented, the following may be further included: and carrying out data processing on target data in a target service scene by using the target model.
In an embodiment, the target service scenario may specifically include: the system comprises a transaction prediction scene, a risk identification scene or a customer service response scene and the like. Accordingly, the target data may be different types of data to be processed corresponding to different target service scenarios.
Specifically, for example, when the target service scenario is a customer service response scenario, the target model may be a customer service response model trained through privacy machine learning, and the target data may be text information of a message left by a customer in a customer service group. Correspondingly, the data processing of the target data in the target service scene by using the target model may specifically be that the customer service response model is invoked to identify and process text information of a message left by a customer in a customer service group, a response text for the message left by the customer is generated, and the response text is automatically fed back to the customer, so that automatic response to a question asked by the customer in the customer service group can be realized.
From the above description, it can be seen that, in the data processing method based on privacy protection provided in the embodiment of the present application, the target tensor data in the plaintext machine learning model to be trained is converted into the custom tensor type matched with the target privacy algorithm according to the preset conversion rule; and meanwhile, replacing a target operator in the plaintext machine learning model to be trained with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule so as to generate a privacy machine learning model matched with the target privacy algorithm. By utilizing the privacy machine learning model, the existing machine learning framework matched with the rear end does not need to be modified subsequently, the existing machine learning framework can be directly utilized to carry out privacy machine learning based on various privacy algorithms, and the data processing cost is reduced. The method solves the technical problems that the existing machine learning framework can not be directly utilized to carry out the private machine learning based on other private algorithms except the safe multi-party calculation, the usability is poor and the processing cost is high in the existing data processing method.
Referring to fig. 4, another data processing method based on privacy protection is provided in the embodiments of the present disclosure, and when the method is implemented, the following may be included.
S401: the method comprises the steps of obtaining a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm.
S402: converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; and replacing the target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule so as to generate a privacy machine learning model.
S403: and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
In this embodiment, after the privacy machine learning model is established in the above manner, the existing preset machine learning framework (e.g., the mainstream AI framework) can be directly utilized to efficiently perform privacy machine learning based on other privacy algorithms except for secure multiparty computation, and the existing preset machine learning framework does not need to be modified additionally, so that the operation of a user is simplified, and the data processing cost in the privacy machine learning process is reduced.
Based on the same inventive concept, embodiments of the present application further provide a data processing apparatus based on privacy protection, as described in the following embodiments. Because the principle of solving the problem of the data processing device based on privacy protection is similar to that of the data processing method based on privacy protection, the implementation of the data processing device based on privacy protection can be referred to the implementation of the data processing method based on privacy protection, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Please refer to fig. 5, which is a block diagram of a data processing apparatus based on privacy protection according to an embodiment of the present application, where the apparatus may specifically include: an acquisition module 501 and a processing module 502, and the structure will be described in detail below.
The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plaintext machine learning model to be trained as an initial model and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
the processing module is used for converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
In one embodiment, the target privacy algorithm may specifically include at least one of: a secure multi-party calculation algorithm, a homomorphic encryption algorithm, a zero-knowledge proof algorithm, etc.
In an embodiment, the processing module includes a tensor type conversion unit, which may be specifically configured to determine, from the initial model, native tensor data related to privacy sample data in model training and/or native tensor data related to training variable data in model training as target tensor data; and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
In one embodiment, the processing module includes a privacy operator replacing unit, configured to determine a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows; and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
In one embodiment, the apparatus further includes a training module configured to train the privacy machine learning model using a preset machine learning framework matched with the initial model to obtain a target model meeting requirements.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should be noted that, the systems, devices, modules or units described in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, in the present specification, the above devices are described as being divided into various units by functions, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
Moreover, in the subject specification, adjectives such as first and second may only be used to distinguish one element or action from another element or action without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.
From the above description, it can be seen that, by using the data processing apparatus based on privacy protection provided in the embodiment of the present application to establish the privacy machine learning model for model training, it is not necessary to modify the existing machine learning framework adapted to the back end, and the existing machine learning framework can be directly used to perform privacy machine learning based on multiple privacy algorithms, thereby reducing the data processing cost. The method solves the technical problems that the existing privacy machine learning framework can not be directly utilized to carry out privacy machine learning based on other privacy algorithms except for safe multi-party calculation, the usability is poor and the processing cost is high in the existing data processing method.
The present specification further provides a server, which can be seen in fig. 6, and the server includes a network communication port 601, a processor 602, and a memory 603, and the above structures are connected by an internal cable, so that the structures can perform specific data interaction.
The network communication port 601 may be specifically configured to obtain a plaintext machine learning model to be trained as an initial model, and determine a privacy encryption algorithm for the initial model as a target privacy algorithm.
The processor 602 may be specifically configured to convert, according to a preset conversion rule, the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
The memory 603 may be specifically configured to store a corresponding instruction program.
In this embodiment, the network communication port 601 may be a virtual port bound with different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor 602 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.
In this embodiment, the memory 603 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
An embodiment of the present application further provides a computer storage medium of a data processing method based on privacy protection, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm; converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
In the present embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard disk (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
In a specific implementation scenario example, the data processing method and apparatus for providing privacy-based protection according to the embodiments of the present application may be applied to efficiently convert a plaintext machine learning model into a privacy machine learning model capable of simultaneously supporting a target privacy algorithm and supporting a functional mechanism using an existing machine learning framework. The following can be referred to as a specific implementation process.
The privacy machine learning framework is a combination of cryptography and AI (i.e. machine learning) that can solve the privacy problem of the data industry today, and these mainstream privacy machine frameworks are, for example: TF-Encrypted, PySyft, CrypTen, etc., these mainstream privacy machine learning framework backend are usually adapted using mainstream AI framework, such as: tensorflow, PyTorch, and the like. These privacy AI frameworks look much like mainstream AI frameworks, taking advantage of the ease of use of the framework APIs, and train and predict encrypted data through various cryptographic algorithms such as MPC (secure multi-party computing) and HE (homomorphic encryption), in order to allow users to use privacy-preserving machine learning for privacy machine learning without being proficient in cryptography, distributed systems, or high-performance computing expertise. However, most of these privacy AI frameworks only support secure multiparty computing algorithms, and other cryptographic algorithms (i.e., privacy algorithms) such as homomorphic encryption are not supported, the most important reason is that the main AI framework is used at the back end, and the bit width of the numerical type in the main AI framework is 64bits at most, which is consistent with the bit width used by most MPC algorithm unit elements, so that the tensor (i.e., tensor data) can flow in the front-end and back-end AI frameworks of the privacy framework, and can support automatic derivation, engine execution and other functional mechanisms in the main AI framework.
Therefore, it is easy to extend to support generic MPC algorithms, but if it is desired to extend to support (besides MPC) other privacy algorithms, such as the state algorithm (HE), it is not possible to use native value types in the mainstream AI framework to support custom data types because the bit width of the value used by the unit element of the homomorphic algorithm is large, even more than 1000 bits. However, the mechanisms such as automatic derivation in the mainstream AI framework cannot be used after the custom data type is extended, and the mechanisms such as automatic derivation in the mainstream AI framework cannot be supported without extending the custom data type, which is a main problem faced by the current privacy AI framework.
In this scenario example, further analyzing the pain point problem that the current mainstream privacy AI framework cannot easily extend support of multiple privacy algorithms, specifically the following cases may be included.
1. The mainstream AI frameworks used in the back end of the mainstream privacy AI framework are mainstream AI frameworks such as tensoflow, Torch, etc., but the native numeric type bit width supported by these AI frameworks is 64bits, the value type bit width of the unit elements of various privacy algorithms is far larger than 64bits, so that a scheme for expanding the custom data type is required to be used by the privacy machine learning framework to support various privacy algorithms, but after the custom data type is expanded, the front-end data tensor of the privacy machine learning framework can not flow normally in the back-end AI framework, even if streaming is possible, semantic changes will result, i.e. one tensor at the front corresponds to the back tensors, the number of tensors at the back end corresponding to one tensor at the front end of different privacy algorithms is different, so that the usability, the maintainability and the expansibility are poor, in addition, the user-defined data type is used for expanding according to the mode, and then the functional mechanisms such as automatic derivation in a back-end AI framework are difficult to multiplex.
2. If the string data type of the plaintext AI framework is used to support multiple privacy algorithms in an extended manner, any privacy algorithm can be supported by using the string type because the bit width of the string data type is variable, but the variables of the string type do not support multiple mathematical operations, for example: even if the matrix multiplication, square, etc. operations can be supported by modification, the native AI framework semantics will change, which is difficult for the user to understand. Resulting in user difficulty and poor user experience. For example, matrix multiplication of variables of two string data types is not at all understandable to the user.
In view of the above situation, in the present scenario example, a solution is proposed to support multiple privacy algorithms using a custom data type extension, which may solve the above problem, and a specific solution is as follows.
The method has the advantages that a privacy self-defined Tensor class is realized, the class name is PctTensor, a main stream machine learning Tensor object is aggregated in the PctTensor, the original Tensor in a main stream machine learning frame can be converted into the PctTensor (namely a self-defined type), and the PctTensor can also be converted into the Tensor in the main stream machine learning frame. The PctTensor can also reload common mathematical operators including matrix multiplication and the like, and the mathematical operators are all realized by using a self-defined OP mode, and the internal input and output data type of the self-defined OP (reloading operation) can be stored by using a string type. Therefore, the upper layer can carry out encode on the data of the user layer according to different privacy algorithms to downwards propagate the data of the string type, then the decode is carried out on the data of the string type by self-defined OP, the data coming out of the decode is subjected to semantic privacy calculation, and the encode is downwards propagated for the string type after the calculation result is completed until the execution of the whole calculation graph is completed. Because unit element data bit widths used by different privacy algorithms are different, the data are subjected to encode to be of a string type, the privacy operator firstly performs decode and then performs privacy calculation before executing the privacy operator, and the encode is transmitted downwards for the string type after the privacy calculation is completed. In this way, any privacy algorithm (including other privacy algorithms except secure multi-party computation) can be supported, and the string type is a native data type in the back-end mainstream AI framework, so that the functional mechanisms such as an automatic derivation mechanism and an execution engine in the back-end mainstream machine learning framework can be used by slightly modifying the string type. In addition, string types used in the privacy operators are also transparent to users, and numerical data types can also be used for user writing codes.
In the present scenario example, based on the above-described scheme, a processing apparatus including the following modules may be constructed. Specifically, the processing device may include: the device comprises a user-defined tensor module, a tensor conversion module, a privacy operator module and an operator replacement module. Wherein:
1) the self-defining tensor module is mainly used for realizing a self-defining tensor class and realizing a mathematical operation function commonly used in machine learning in the self-defining tensor class;
2) the tensor conversion module is mainly used for realizing mutual conversion operators of the AI frame native tensor and the custom tensor, and comprises a native tensor conversion operator and a custom tensor conversion operator. The process of converting the native tensor into the custom tensor can be called Encode, the process of converting the custom tensor into the native tensor can be called Decode, and the Encode and Decode of different privacy algorithms can be realized differently;
3) a privacy operator module, which mainly implements various privacy operators and privacy gradient operators corresponding thereto, for example: safe multiparty calculation operators, homomorphic operators, zero-knowledge proof operators and the like;
4) and the operator replacing module mainly realizes the function of replacing the original operator in the privacy AI frame with the privacy operator, and can replace the plaintext operator with the corresponding privacy operator (namely the target privacy operator matched with the target privacy algorithm) according to the privacy algorithm selected by the user.
In this scenario example, as shown in fig. 7, the processing device may convert a plaintext machine learning model into a corresponding privacy machine learning model, and then perform learning training on the privacy machine learning model by using an existing machine learning framework. When implemented, the following may be included.
S1, a plaintext model (i.e., initial model) is input, which may be a usable model developed based on the mainstream AI framework. The mainstream AI framework includes, but is not limited to, TensorFlow, PyTorch, MXNet, and the like.
S2, selecting a privacy algorithm (i.e., a targeted privacy algorithm), the user may select a privacy algorithm to be used to privacy encrypt the plaintext model, including but not limited to secure multiparty computation, homomorphic encryption, zero knowledge proof, etc.
And S3, tensor conversion, wherein the native tensor in the plaintext AI model is converted into the custom tensor. The conversion principle is to scan the calculation graph once, insert an operator (Encode OP, i.e. a first type of conversion operator) for converting the native tensor into the custom tensor at the corresponding input position, and insert an operator (Decode OP, i.e. a second type of conversion operator) for converting the custom tensor into the native tensor at the corresponding output position. A computational graph of the model before and after the tensor conversion can be seen in fig. 2.
S4, first, implementing the privacy operator, and respectively implementing the privacy operator on various devices by using a cryptographic algorithm. These devices include, but are not limited to, CPUs, GPUs, and the like. Privacy operators include, but are not limited to, secure multiparty computation operators, homomorphic operators, zero knowledge proof operators, and the like. And realizing a corresponding gradient function for each privacy operator, and associating the privacy operators with the corresponding privacy gradient operators and registering the associated privacy operators and the corresponding privacy gradient operators in a privacy AI framework. And then, replacing the plaintext operator on the model with the privacy operator corresponding to the privacy algorithm according to the privacy algorithm selected by the user. If the privacy algorithm selected by the user is a homomorphic encryption algorithm, then homomorphic operators may be used in place of the corresponding plaintext operators. A computational graph on the pre-and post-substitution operator model can be seen in fig. 3.
And S5, when the corresponding privacy operator in the model is replaced by executing the replacement algorithm through the privacy AI framework, converting the plaintext model input by the user into the privacy model (namely the privacy machine learning model).
S6, the transformed privacy model is executed using the privacy AI framework execution engine.
And S7, because the input and output tensors of the privacy operator are all custom tensors, the output of the privacy model after the execution of the privacy model is the custom tensor.
And S8, converting the self-defined tensor output by the privacy model into an AI frame native tensor type through a Decode operator, and outputting a result.
By means of the method, machine learning can be performed on the privacy model by the aid of the privacy AI frame on the premise that data privacy is protected, and the target model meeting requirements is obtained.
Through the scene example, the privacy machine learning model obtained by the data processing method and the device for privacy protection provided by the embodiment of the application can better support a target privacy algorithm and a function mechanism using the existing machine learning framework, the existing machine learning framework matched with the rear end does not need to be modified subsequently, the existing machine learning framework can be directly used for privacy machine learning based on various privacy algorithms, and the data processing cost is reduced.
Although various specific embodiments are mentioned in the disclosure of the present application, the present application is not limited to the cases described in the industry standards or the examples, and the like, and some industry standards or the embodiments slightly modified based on the implementation described in the custom manner or the examples can also achieve the same, equivalent or similar, or the expected implementation effects after the modifications. Embodiments employing such modified or transformed data acquisition, processing, output, determination, etc., may still fall within the scope of alternative embodiments of the present application.
Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded.
The devices or modules and the like explained in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, in implementing the present application, the functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules, and the like. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the present application has been described by way of examples, those of ordinary skill in the art will appreciate that there are numerous variations and permutations of the present application that do not depart from the spirit of the present application and that the appended embodiments are intended to include such variations and permutations without departing from the present application.

Claims (18)

1. A data processing method based on privacy protection is characterized by comprising the following steps:
acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
2. The method of claim 1, wherein the target privacy algorithm comprises at least one of: a secure multi-party calculation algorithm, a homomorphic encryption algorithm and a zero-knowledge proof algorithm.
3. The method of claim 1, wherein converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule comprises:
determining, from the initial model, native tensor data related to privacy sample data in model training and/or native tensor data related to training variable data in model training as target tensor data;
and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
4. The method of claim 3, wherein converting the target tensor data into tensor data of a custom tensor type matched with the target privacy algorithm according to a preset conversion rule comprises:
determining a first class conversion operator matched with a target privacy algorithm according to the target privacy algorithm; the first-class conversion operator is used for converting the target tensor data into a custom tensor type matched with a target privacy algorithm;
inserting said first class of conversion operators at the input of an operator associated with the target tensor data in said initial model.
5. The method of claim 4, further comprising:
determining a second class of conversion operators matched with the target privacy algorithm according to the target privacy algorithm; the second type of conversion operator is used for restoring the target tensor data of the user-defined tensor type into a native tensor type;
inserting the second class of conversion operators at the output of an operator associated with the target tensor data in the initial model.
6. The method of claim 1, wherein replacing the target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule comprises:
determining a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows;
and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
7. The method according to claim 6, wherein replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule comprises:
determining a target privacy operator matched with the target privacy algorithm;
determining a corresponding target privacy gradient operator according to the target privacy operator;
after the target privacy operator and the target privacy gradient operator are associated, registering the associated target privacy operator and the target privacy gradient operator in a preset machine learning frame matched with the initial model;
replacing the target operator in the initial model with the target privacy operator by executing a replacement algorithm.
8. The method of claim 1, wherein after generating the privacy machine learning model, the method further comprises:
and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
9. The method of claim 8, wherein the pre-set machine learning framework comprises at least one of: tensorflow, PyTorch, MXNet.
10. The method of claim 8, wherein after obtaining the target model, the method further comprises: and carrying out data processing on target data in a target service scene by using the target model.
11. A data processing apparatus based on privacy protection, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plaintext machine learning model to be trained as an initial model and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
the processing module is used for converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model; the privacy machine learning model supports privacy machine learning based on a target privacy algorithm through a preset machine learning frame.
12. The apparatus of claim 11, wherein the target privacy algorithm comprises at least one of: a secure multi-party calculation algorithm, a homomorphic encryption algorithm and a zero-knowledge proof algorithm.
13. The apparatus according to claim 11, wherein the processing module includes a tensor type conversion unit configured to determine, from the initial model, as the target tensor data, the native tensor data related to the privacy sample data in the model training and/or the native tensor data related to the training variable data in the model training; and converting the target tensor data into tensor data of a user-defined tensor type matched with the target privacy algorithm according to a preset conversion rule.
14. The apparatus according to claim 11, wherein the processing module comprises a privacy operator replacing unit configured to determine a target operator to be replaced from the initial model; wherein the target operator comprises: an operator through which privacy sample data flows and/or an operator through which training variable data flows; and replacing the target operator with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule.
15. The apparatus of claim 11, further comprising a training module configured to train the privacy machine learning model to obtain a target model meeting requirements using a preset machine learning framework matched to the initial model.
16. A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 10.
17. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 10.
18. A data processing method based on privacy protection is characterized by comprising the following steps:
acquiring a plaintext machine learning model to be trained as an initial model, and determining a privacy encryption algorithm aiming at the initial model as a target privacy algorithm;
converting the target tensor data in the initial model into a custom tensor type matched with the target privacy algorithm according to a preset conversion rule; replacing a target operator in the initial model with a target privacy operator matched with the target privacy algorithm according to a preset replacement rule to generate a privacy machine learning model;
and training the privacy machine learning model by utilizing a preset machine learning framework matched with the initial model so as to obtain a target model meeting the requirement.
CN202010644238.3A 2020-07-07 2020-07-07 Data processing method and device based on privacy protection and server Pending CN111783124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010644238.3A CN111783124A (en) 2020-07-07 2020-07-07 Data processing method and device based on privacy protection and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010644238.3A CN111783124A (en) 2020-07-07 2020-07-07 Data processing method and device based on privacy protection and server

Publications (1)

Publication Number Publication Date
CN111783124A true CN111783124A (en) 2020-10-16

Family

ID=72759497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010644238.3A Pending CN111783124A (en) 2020-07-07 2020-07-07 Data processing method and device based on privacy protection and server

Country Status (1)

Country Link
CN (1) CN111783124A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287396A (en) * 2020-12-24 2021-01-29 北京瑞莱智慧科技有限公司 Data processing method and device based on privacy protection
CN112632607A (en) * 2020-12-22 2021-04-09 中国建设银行股份有限公司 Data processing method, device and equipment
CN113285960A (en) * 2021-07-21 2021-08-20 湖南轻悦健康管理有限公司 Data encryption method and system for service data sharing cloud platform
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
CN113672985A (en) * 2021-08-25 2021-11-19 支付宝(杭州)信息技术有限公司 Machine learning algorithm script compiling method and compiler for privacy protection
WO2022109861A1 (en) * 2020-11-25 2022-06-02 上海阵方科技有限公司 Method, apparatus and device for preparing training data for encrypted machine learning
CN114764509A (en) * 2022-06-14 2022-07-19 深圳致星科技有限公司 Interconnection and intercommunication method and device for privacy calculation, privacy data and federal learning
CN115660049A (en) * 2022-11-02 2023-01-31 北京百度网讯科技有限公司 Model processing method, model processing device, electronic equipment and storage medium
CN112632607B (en) * 2020-12-22 2024-04-26 中国建设银行股份有限公司 Data processing method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
CN110598722A (en) * 2018-06-12 2019-12-20 清华大学 Multi-modal neuroimaging data automatic information fusion system
CN110619220A (en) * 2019-08-09 2019-12-27 北京小米移动软件有限公司 Method and device for encrypting neural network model and storage medium
US20200082259A1 (en) * 2018-09-10 2020-03-12 International Business Machines Corporation System for Measuring Information Leakage of Deep Learning Models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598722A (en) * 2018-06-12 2019-12-20 清华大学 Multi-modal neuroimaging data automatic information fusion system
US20200082259A1 (en) * 2018-09-10 2020-03-12 International Business Machines Corporation System for Measuring Information Leakage of Deep Learning Models
CN110008717A (en) * 2019-02-26 2019-07-12 东北大学 Support the decision tree classification service system and method for secret protection
CN110619220A (en) * 2019-08-09 2019-12-27 北京小米移动软件有限公司 Method and device for encrypting neural network model and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022109861A1 (en) * 2020-11-25 2022-06-02 上海阵方科技有限公司 Method, apparatus and device for preparing training data for encrypted machine learning
CN112632607A (en) * 2020-12-22 2021-04-09 中国建设银行股份有限公司 Data processing method, device and equipment
CN112632607B (en) * 2020-12-22 2024-04-26 中国建设银行股份有限公司 Data processing method, device and equipment
CN112287396A (en) * 2020-12-24 2021-01-29 北京瑞莱智慧科技有限公司 Data processing method and device based on privacy protection
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
CN113342346B (en) * 2021-05-18 2022-03-25 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
US11625248B2 (en) 2021-05-18 2023-04-11 Beijing Baidu Netcom Science Technology Co., Ltd. Operator registration method and apparatus for deep learning framework, device and storage medium
CN113285960A (en) * 2021-07-21 2021-08-20 湖南轻悦健康管理有限公司 Data encryption method and system for service data sharing cloud platform
CN113672985A (en) * 2021-08-25 2021-11-19 支付宝(杭州)信息技术有限公司 Machine learning algorithm script compiling method and compiler for privacy protection
WO2023024735A1 (en) * 2021-08-25 2023-03-02 支付宝(杭州)信息技术有限公司 Compilation method for machine learning algorithm script for privacy protection, and compiler
CN113672985B (en) * 2021-08-25 2023-11-14 支付宝(杭州)信息技术有限公司 Machine learning algorithm script compiling method and compiler for privacy protection
CN114764509A (en) * 2022-06-14 2022-07-19 深圳致星科技有限公司 Interconnection and intercommunication method and device for privacy calculation, privacy data and federal learning
CN114764509B (en) * 2022-06-14 2022-08-26 深圳致星科技有限公司 Interconnection and intercommunication method and device for privacy calculation, privacy data and federal learning
CN115660049A (en) * 2022-11-02 2023-01-31 北京百度网讯科技有限公司 Model processing method, model processing device, electronic equipment and storage medium
CN115660049B (en) * 2022-11-02 2023-07-25 北京百度网讯科技有限公司 Model processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111783124A (en) Data processing method and device based on privacy protection and server
CN110598442A (en) Sensitive data self-adaptive desensitization method and system
CN107819569B (en) The encryption method and terminal device of log-on message
CN110689349B (en) Transaction hash value storage and searching method and device in blockchain
CN110933063B (en) Data encryption method, data decryption method and equipment
KR102550812B1 (en) Method for comparing ciphertext using homomorphic encryption and apparatus for executing thereof
US10601580B2 (en) Secure order preserving string compression
CN105827582A (en) Communication encryption method, device and system
WO2017006118A1 (en) Secure distributed encryption system and method
CN111753324A (en) Private data processing method, private data computing method and applicable equipment
US10536276B2 (en) Associating identical fields encrypted with different keys
US11568076B2 (en) Computer-implemented method of transferring a data string from an application to a data protection device
CN110489992A (en) Desensitization method and system based on big data platform
CN113722754A (en) Generation method and device of privacy executable file and server
CN109522060A (en) The restoring method and terminal device of business scenario
CN113254989B (en) Fusion method and device of target data and server
CN110569659B (en) Data processing method and device and electronic equipment
US9485221B2 (en) Selective content cloud storage with device synchronization
Ismail et al. Securing the Information using Improved Modular Encryption Standard in Cloud Computing Environment
CN109361717A (en) Encrypted content file method, apparatus and electronic equipment
Anbazhagan et al. Cloud Computing Security Through Symmetric Cipher Model
CN113572611B (en) Key processing method and device and electronic device
CN113179161B (en) Method and device for replacing secret key, computer equipment and storage medium
US20220376888A1 (en) Efficiently batching pre-encrypted data for homomorphic inference
CN107193884A (en) A kind of method and apparatus of matched data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination