CN112541593B - Method and device for jointly training business model based on privacy protection - Google Patents

Method and device for jointly training business model based on privacy protection Download PDF

Info

Publication number
CN112541593B
CN112541593B CN202011409592.4A CN202011409592A CN112541593B CN 112541593 B CN112541593 B CN 112541593B CN 202011409592 A CN202011409592 A CN 202011409592A CN 112541593 B CN112541593 B CN 112541593B
Authority
CN
China
Prior art keywords
disturbance
network
matrix
layer
random
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011409592.4A
Other languages
Chinese (zh)
Other versions
CN112541593A (en
Inventor
熊涛
冯岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202011409592.4A priority Critical patent/CN112541593B/en
Priority to CN202210742526.1A priority patent/CN114936650A/en
Publication of CN112541593A publication Critical patent/CN112541593A/en
Application granted granted Critical
Publication of CN112541593B publication Critical patent/CN112541593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioethics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

According to the method, a server determines a disturbance matrix aiming at each network layer in a neural network for realizing a service model, and performs disturbance encryption on parameters of the network layer by using the disturbance matrix to obtain a disturbance encryption model, and the disturbance encryption model is distributed to each terminal. And the terminal processes the local training sample by using the disturbance encryption model to obtain a disturbance gradient. Also, the terminal superimposes noise on the disturbance gradient. By elaborately designing the distribution of the noise, the noise obtained after the disturbance matrix is recovered conforms to Gaussian distribution, so that the requirement of differential privacy is met. The server can then perform perturbation recovery and aggregation on the noise-containing gradients sent by the terminals, so as to update the parameters in the neural network model.

Description

Method and device for jointly training business model based on privacy protection
Technical Field
One or more embodiments of the present disclosure relate to the field of machine learning, and more particularly, to a model joint training method and apparatus for protecting privacy in a distributed system.
Background
The rapid development of machine learning enables various machine learning models to be applied to various business scenes. Because the prediction performance of the model depends on the abundance and availability of the training samples, in order to obtain a service prediction model with more excellent performance, training data of a plurality of platforms are generally required to be comprehensively utilized to train the model together.
Specifically, in a scenario in which data is distributed longitudinally, a plurality of platforms may have different feature data of the same batch of business objects. For example, in a merchant classification analysis scenario based on machine learning, an electronic payment platform has transaction flow data of merchants, an electronic commerce platform stores sales data of the merchants, and a banking institution has loan data of the merchants. In a scenario where data is distributed horizontally, multiple platforms may each possess the same attribute characteristics of different business objects. Such as banking institutions in different regions, each have loan data for locally registered merchants. There are of course also cases where the longitudinal and transverse distributions are combined.
Training data local to multiple platforms often contains privacy of local business objects, especially user privacy. Furthermore, a local model trained according to local training data may also risk to leak local data features. Therefore, in the scenario of a multi-party co-training model, data security and data privacy issues are a great challenge.
Therefore, it is desirable to provide an improved scheme for ensuring privacy data of each party is not leaked and data security is ensured under the condition that a plurality of parties train a business model together.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for jointly training a service model, which protect private data from being leaked and ensure data security by performing perturbation encryption on the model and adding noise to a gradient.
According to a first aspect, there is provided a method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a server and comprising:
determining corresponding random disturbance matrixes for a plurality of network layers in the neural network;
carrying out disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;
sending a disturbance encryption model to a plurality of terminals, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes corresponding to the plurality of network layers;
receiving confusion gradient items respectively corresponding to the plurality of network layers from a first terminal in any of the plurality of terminals, wherein the confusion gradient items are obtained by superposing second noise on first noise gradient items, the first noise gradient items are obtained by processing a first sample set local to the first terminal by using the perturbation encryption model, and the combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies Gaussian distribution;
restoring the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the network layers to obtain gradient restoration results of the network layers;
and aggregating the gradient recovery results corresponding to the plurality of terminals, and updating the current parameter matrixes of the plurality of network layers according to the aggregation result.
According to an embodiment, determining, for a plurality of network layers in the neural network, a corresponding random perturbation matrix specifically includes: for each network layer of the neural network, determining a corresponding random vector, wherein the dimensionality of the random vector is the same as the number of neurons in the corresponding network layer; and determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of the adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.
In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; in such a case, the determining the corresponding random vector specifically includes: determining a first random vector aiming at each middle layer in an actual network layer; determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.
Further, in an example, each vector element in the first random vector conforms to a ternary decomposition data distribution of a gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer; under such a condition, determining the random disturbance matrix corresponding to the first network layer specifically includes: and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.
In one embodiment, determining, for each network layer of the neural network, a corresponding random vector further comprises: determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution; and, the step of determining the random perturbation matrix further comprises: and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.
In one embodiment, for the last transition network layer, determining a last second random vector, wherein the reciprocal of each vector element conforms to the binary decomposition data distribution of the gaussian distribution; and the step of determining the corresponding random perturbation matrix further comprises: and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.
According to one embodiment, the aforementioned plurality of network layers are the N actual network layers to be trained.
According to an embodiment, the method for obtaining the perturbation encryption parameter matrix of the network layer by using the random perturbation matrix to perform perturbation processing on the current parameter matrix of the corresponding network layer specifically includes: for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer; and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.
In various embodiments, the business model is used to predict business objects, the business objects including one of: user, merchant, transaction, image, text, audio.
According to a second aspect, there is provided a method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a first terminal and comprising:
receiving a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;
processing a first sample set local to the first terminal by using the disturbance encryption model to obtain a first noise gradient item aiming at each network layer in the plurality of network layers;
superposing second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;
and sending the confusion gradient item to the server, recovering the confusion gradient item of the corresponding network layer by using the random disturbance matrix, and aggregating the recovery gradients corresponding to a plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.
In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; the plurality of network layers are the N actual network layers to be trained.
According to one embodiment, the first noise gradient term is obtained by: inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer; and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.
In one embodiment, superimposing a second noise on the first noise gradient term specifically includes: determining a noise matrix corresponding to each network layer; and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.
Further, determining the noise matrix corresponding to each network layer may include: for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution; for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.
According to a third aspect, there is provided an apparatus for jointly training a business model based on privacy protection, where the business model is implemented by a neural network, and the apparatus is deployed in a server, and includes:
a disturbance matrix determination unit configured to determine, for a plurality of network layers in the neural network, corresponding random disturbance matrices;
the disturbance encryption unit is configured to perform disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;
a sending unit configured to send a disturbed encryption model to a plurality of terminals, where the disturbed encryption model includes disturbed encryption parameter matrices corresponding to the plurality of network layers;
a receiving unit, configured to receive confusion gradient terms respectively corresponding to the plurality of network layers from a first terminal of any of the plurality of terminals, where the confusion gradient terms are obtained by superimposing a second noise on a first noise gradient term, where the first noise gradient term is obtained by processing a first sample set local to the first terminal using the perturbation encryption model, and a combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies a gaussian distribution;
the disturbance recovery unit is configured to recover the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the plurality of network layers to obtain gradient recovery results of the plurality of network layers;
and the aggregation updating unit is configured to aggregate the gradient recovery results corresponding to the plurality of terminals, and update the current parameter matrixes of the plurality of network layers according to the aggregation result.
According to a fourth aspect, there is provided an apparatus for jointly training a business model based on privacy protection, where the business model is implemented by a neural network, and the apparatus is deployed in a first terminal, and includes:
the receiving unit is configured to receive a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;
a gradient obtaining unit configured to process a first sample set local to the first terminal by using the perturbation encryption model to obtain a first noise gradient item for each of the plurality of network layers;
the noise adding unit is configured to superimpose second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;
and the sending unit is configured to send the confusion gradient items to the server, so that the server recovers the confusion gradient items of the corresponding network layers by using the random disturbance matrix, and aggregates the recovery gradients corresponding to the plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.
According to the method and the device provided by the embodiment of the specification, in the iterative model updating process, the server performs disturbance encryption on the parameters of each network layer and then sends the parameters to the terminal. The terminal can directly process the sample data based on the disturbance encryption model to obtain the disturbance gradient. The disturbance encryption of the model parameters ensures the safety of the model parameters on one hand, and the terminal can directly process samples without decryption by the disturbance encryption mode, thereby greatly saving the computing resources of the terminal. In addition, after the terminal obtains the disturbance gradient, noise is added to the terminal, and the final noise after disturbance recovery conforms to Gaussian distribution, so that the requirement of differential privacy is met. Therefore, the server and other parties can not obtain the true value of the gradient, and the safety of the private data is further ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates an example scenario for business model training via federated learning;
FIG. 2 illustrates a flow diagram of a business model training method to protect privacy, according to one embodiment;
FIG. 3 shows a schematic diagram of a plurality of network layers in a neural network;
FIG. 4 shows a schematic diagram of an extended neural network;
FIG. 5 shows a schematic diagram of a training apparatus deployed in a server, according to one embodiment;
fig. 6 shows a schematic diagram of a training apparatus deployed in a terminal according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
FIG. 1 illustrates an example scenario of business model training through federated learning. In the schematic scenario of fig. 1, the distributed system includes a server and N terminals, each having a training sample for model training. The terminal may be a data platform of a large organization, such as a bank, a hospital, a payment platform, etc., or a small device, such as a personal PC, a smart phone, an IoT device, etc. The device types of the respective terminals may be the same or different.
The business model to be trained is used for predicting business objects, and the business objects can be various objects such as users, merchants, transactions, images, texts, audios and the like. For model training, each terminal has a respective service object sample set, where each sample includes feature information of a service object as a sample feature, and further includes a label corresponding to a prediction target, where the label may be a classification label or a regression value label. For example, in one specific example, the business object is a user represented by an account. Accordingly, sample characteristics may include, for example, registration duration of the account, registration information, frequency of use over a recent period of time, frequency of comments made, etc.; the tag may be a user classification tag, for example, to show the crowd to which the user belongs, or to show whether the account is an abnormal account (spam account, naval account, stolen account, etc.). In another example, the business object is a transaction. Accordingly, the sample characteristics may include, for example, transaction amount, transaction time, payment channel, transaction party attribute information, and the like. The present specification does not limit the service objects, and the situations of various service objects are not described in detail.
In a typical federated learning process, the server determines the structure and algorithms of the business model to be trained, initializes them, and then updates the model through multiple iterations. In each iteration, the server issues the current parameter W of the model to each terminal. Each terminal i trains the model with the current parameter W based on the local training sample thereof to obtain the gradient G of the model parameteriI.e. the amount of change or update of the parameter. Then, the terminal i transmits gradient information of the model parameters to the server. And the server respectively obtains N parts of gradient information from the N terminals, aggregates the N parts of gradient information to obtain a comprehensive gradient of the parameters of the iteration model in the current round, and updates the parameters of the model according to the comprehensive gradient until a training end condition is reached. Thus, theAnd the server and the N terminals cooperatively realize the joint training of the model.
However, implementing the above federal learning procedure unprotected presents a risk of privacy disclosure. On one hand, the parameter gradient obtained based on the local sample may carry information of the local sample, and the local sample is often privacy data that needs to be protected by the data platform, for example, personal information of the user, bank flow, medical records, and the like. A malicious attacker may learn the information of the training sample from the gradient information of the interaction between the terminal and the server. Or, one terminal may reversely deduce gradient information of other terminals from parameter information issued by the server, so as to infer training sample data of the terminal. In addition, there is a privacy problem in the output of the model, that is, an attacker may guess whether a training sample contains a specific data record by querying the model during or after training.
In order to protect the privacy data of each terminal in the federal learning process, the inventor proposes that the server adopts a disturbance encryption algorithm to carry out disturbance encryption on model parameters and then distributes the model parameters to each terminal. And the terminal processes the local training sample by using the disturbance encryption model to obtain a disturbance gradient. In order to avoid that the server recovers the true gradient by using a decryption mode corresponding to the disturbance encryption algorithm, the terminal also superimposes noise on the disturbance gradient. By carefully designing the distribution of the noise, the noise after disturbance decryption conforms to Gaussian distribution, thereby meeting the requirement of differential privacy protection. In this way, the true plaintext values of the model parameters and the gradients are not exposed during the federal learning process, thereby ensuring the security of the private data.
The following describes a specific implementation of the above concept.
FIG. 2 illustrates a flow diagram of a business model training method to protect privacy, according to one embodiment. The flow chart relates to a server and a first terminal, the server may be implemented by any apparatus, device, cluster of devices having computing and processing capabilities. The first terminal is any one of a plurality of terminals participating in federal learning and may have any device type. Only one terminal is shown in this figure for simplicity and clarity. However, it should be understood that multiple terminals in the federal learning process can each implement the training process in the manner of the first terminal.
The business model can be implemented as a neural network, such as a multi-layer fully-connected deep neural network DNN (or called multi-layer perceptron MLP), a convolutional neural network CNN, and so on. The following is described in detail with reference to an example of a multi-layered perceptron MLP.
FIG. 3 shows a schematic diagram of a plurality of network layers in a neural network. Assume that the neural network contains L network layers, where an arbitrary ith network layer is denoted as layer L. When L is 1, it is indicated as an input layer, when L is L, it is indicated as an output layer, and in the rest cases, it is an intermediate layer. In the case of MLP, each network layer except the output layer, in which each neuron is connected to each neuron in the next network layer. Suppose layer l has nlA neuron whose anterior layer l-1 has nl-1Each neuron has n from layer l-1 to layer ll-1×nlAnd each connecting edge is provided with a corresponding weight which is used as a model parameter to be trained in the model. Thus, layer l has nl-1×nlA model parameter, constituting nl-1×nlParameter matrix of dimension, denoted as W(l)
For neural networks that employ Relu as a neuron activation function, the output of each network layer l can be expressed as y(l)And is calculated by the following formula:
Figure BDA0002819261100000101
in the model training process, the parameter matrix W of each layer needs to be iterated for multiple times(l)And (6) updating. Fig. 2 shows a process in which the update is performed on any one iteration. The specific implementation is described below.
First, at step 21, the server determines corresponding random perturbation matrices for a plurality of network layers in the neural network. For the network layer l, the server determines a corresponding random disturbance matrix R(l)The random disturbance momentArray R(l)As a key for corresponding parameter matrix W(l)And carrying out disturbance encryption. In general, perturbation encryption is to mask the true value of an encrypted object by means of perturbation transformation. The corresponding inverse transform can be subsequently utilized to eliminate the effect of the perturbation transform. Here, in order to perform scramble encryption on the parameter matrix, a random scramble matrix R is generally required(l)And a parameter matrix W(l)Have the same dimensions.
According to the specific mode of disturbance transformation, different algorithms can be adopted to determine the disturbance matrix. In one embodiment, the corresponding random perturbation matrix may be independently and randomly generated for each network layer. Specifically, for the network layer l, a corresponding number of matrix elements may be randomly generated according to the dimension of the parameter matrix of the layer, and the matrix elements are used as the corresponding random disturbance matrix R(l)
In another embodiment, corresponding random vectors may be determined for each network layer of the neural network, and the perturbation matrix for each network layer may be further generated based on the random vectors for each network layer. Specifically, for the network layer l, a random vector r may be generated correspondingly(l)So that the random vector r(l)Dimension of (d) and number n of neurons in layer llThe same is true. In this way, the elements in the random vectors of the adjacent network layers are combined to obtain the corresponding random disturbance matrix.
For example, assume that network layer l has nlRandom vector r of dimension(l)The previous network layer l-1 has nl-1Random vector r of dimension(l-1)Then can be combined by r(l)And r(l-1)To obtain nl-1×nlAnd the random disturbance matrix of the dimension is used as the random disturbance matrix of the layer l. For the input layer and the output layer, special setting can be carried out to generate corresponding random disturbance matrixes.
In one specific example, for any network layer l, a random perturbation matrix R is generated based on random vectors in the following manner(l)
Figure BDA0002819261100000111
That is, for an input layer l of 1, a random vector r will be assigned(1)The elements in (1) are used as matrix elements to obtain a random disturbance matrix R corresponding to the input layer(1)
For the intermediate layer 2<=l<The random vector r corresponding to the intermediate layer L is defined as L-1(l)Respective vector element in (2)
Figure BDA0002819261100000112
Random vector r corresponding to previous network layer l-1(l-1)Respective vector element in (2)
Figure BDA0002819261100000113
Respectively combining the reciprocals of the network layers l, and taking the combined result as a random disturbance matrix R corresponding to the network layer l(l)Each matrix element in (1)
Figure BDA0002819261100000114
For the output layer L ═ L, the random vector r corresponding to the previous layer L-1 is used(L-1)The reciprocal of each element is used as a matrix element to obtain a random disturbance matrix R corresponding to the output layer(L)
The method for determining the random disturbance matrix with a good effect is exemplified above. In other examples, other random vector element combination methods may also be used, for example, for the middle layer, elements in random vectors of two adjacent layers are multiplied and combined to obtain a corresponding random disturbance matrix.
In one embodiment, a special random vector may be generated for the output layer L ═ L, so that an additional perturbation matrix is constructed on the basis of the above random perturbation matrix for subsequent perturbation encryption.
In particular, for a group comprising nLAn output layer of individual neurons, n can be generatedLRandom vectors of dimensions gamma and raWherein r isaWith elements differing in pairs, the elements in gamma being divisibleDivided into several groups which do not overlap with each other, the elements in each group being identical, using gammaiIndicating the ith group therein. Thus, it can be based on the random vectors γ and r described aboveaConstructing an additional perturbation matrix R for the output layera
Figure BDA0002819261100000115
It is to be understood that the random perturbation matrix of each network layer may be predetermined before each iteration and remains unchanged in multiple iterations; or may be generated temporarily at each iteration. In order to better guarantee the privacy security effect, the random disturbance matrix is preferably regenerated in each iteration.
On the basis of determining the random disturbance matrix of each network layer, the random disturbance matrix can be used as a key for disturbance encryption to perform disturbance encryption processing on the parameter matrix of each network layer. Then, in step 22, the random perturbation matrix is used to perform perturbation processing on the current parameter matrix of the corresponding network layer, so as to obtain a perturbation encryption parameter matrix of the network layer.
According to one embodiment, in this step, for each network layer l, the random perturbation matrix R corresponding to the network layer l is used(l)With its current parameter matrix W(l)Performing corresponding position element combination, such as multiplication combination, addition combination, etc., and using the combination result matrix as the disturbance encryption parameter matrix of the network layer
Figure BDA0002819261100000121
In the following, this variable is represented by the "^" symbol above the variable, which is the perturbed variable.
In one embodiment, special settings are made for the output layer. For the output layer L, the random disturbance matrix R corresponding to the output layer L is used(L)With its current parameter matrix W(L)Corresponding position element combination is carried out, and an additional disturbance matrix R for the output layer is superposedaAnd obtaining a disturbance encryption parameter matrix of the output layer.
In a specific example, the process of performing perturbation encryption processing on the parameter matrix of each layer by using the random perturbation matrix may be represented as:
Figure BDA0002819261100000122
wherein the operation sign
Figure BDA0002819261100000124
The Hadamard operation represents the multiplication of corresponding position elements.
Therefore, according to the example of the formula (4), for the first L-1 network layers, the parameter matrixes of the network layers are multiplied by the random disturbance matrix thereof in a bit-by-bit manner to serve as a disturbance encryption parameter matrix; for the output layer L, an additional disturbance matrix R is superposed on the basis of the bitwise multiplication combined matrixaAnd obtaining a disturbance encryption parameter matrix.
It should be noted that formula (4) shows a perturbation encryption method with a better effect. In other examples, other perturbation encryption algorithms may be used.
Thus, through the process, the disturbance encryption parameter matrix of each network layer is obtained
Figure BDA0002819261100000123
Therefore, the disturbance encryption of the neural network is realized.
Then, in step 23, the server sends a perturbation encryption model to the plurality of terminals, wherein the perturbation encryption model comprises a plurality of perturbation encryption parameter matrixes corresponding to the network layers. The plurality of terminals may be terminals participating in the iterative training of the present round, and the first terminal may be any one of the terminals.
The following describes a processing procedure after the terminal side receives the perturbation encryption model, taking the first terminal as an example.
As described above, the first terminal receives the above-described disturbed encryption model through the above-described step 23. The first terminal then processes its local first sample set using the perturbed cipher model to obtain a first noise gradient term for each of the plurality of network layers at step 24.
Specifically, the first terminal may input the feature data of each sample in the first sample set into the perturbation encryption model, and obtain perturbation output of each network layer l through forward propagation
Figure BDA0002819261100000131
Continuing with the MLP example using Relu function shown in equation (1), the perturbation output of each network layer l
Figure BDA0002819261100000132
Can be expressed as:
Figure BDA0002819261100000133
under the condition that the disturbance encryption parameter matrix is obtained by adopting the formula (4), comparing the real output of each network layer in the formula (1) with the disturbance output in the formula (5), it can be deduced that the following relation is satisfied between the real output and the disturbance output:
Figure BDA0002819261100000134
in the formula (6), the first and second groups,
Figure BDA0002819261100000135
gamma and raIs a special random vector generated for the output layer, utilized in the aforementioned equation (3).
And on the basis of obtaining the disturbance output, performing back propagation by using the label data of each sample and a preset loss function according to the disturbance output of each network layer to obtain a first noise gradient term related to the gradient corresponding to each network layer.
In one embodiment, the loss function employs a mean square error mse (mean Squared error) loss function. At this time, disturbance loss
Figure BDA0002819261100000136
As follows:
Figure BDA0002819261100000137
wherein,
Figure BDA0002819261100000138
representing the tag data.
From this, the true gradient can be derived
Figure BDA0002819261100000141
And a disturbance gradient
Figure BDA0002819261100000142
The relationship between:
Figure BDA0002819261100000143
wherein: v ═ rTr and:
Figure BDA0002819261100000144
Figure BDA00028192611000001410
(9) α in (10) is the same as defined in formula (6):
Figure BDA0002819261100000145
therefore, in the case of an MSE loss function, in order to allow the server to perform disturbance recovery on the gradient, the first noise gradient term determined by the first terminal may include a disturbance gradient
Figure BDA0002819261100000146
Disturbance term σ shown in equation (9)(l)And the disturbance term β shown in equation (10)(l)
Similarly, but slightly different, in the case where the loss function takes the form of cross-entropy CE loss, it can be shown that the following relationship is satisfied between true and perturbed gradients:
Figure BDA0002819261100000147
wherein,
Figure BDA0002819261100000148
β(l)the other parameter terms are the same as defined by equation (8), as defined by equation (10).
Therefore, in the case of the cross-entropy CE loss function, in order to enable the server to perform disturbance recovery on the gradient, the first noise gradient term determined by the first terminal may include a disturbance gradient
Figure BDA0002819261100000149
Disturbance term ψ shown in equation (12)(l)And the disturbance term β shown in equation (10)(l)
However, as previously mentioned, it is not desirable for the server to be able to fully recover the true value of the resulting gradient in order to protect data privacy. Therefore, next, in step 25, a second noise is superimposed on the above first noise gradient term, to obtain an aliasing gradient term for each network layer; and in order to meet the requirement of differential privacy, the combination result of the second noise and the random disturbance matrix of the corresponding network layer is enabled to meet Gaussian distribution.
Here, the gaussian mechanism of differential privacy is briefly introduced. Gaussian noise algorithm meeting difference privacy
Figure BDA00028192611000001510
Having the form:
Figure BDA0002819261100000151
where f is the query function, d is the input data,
Figure BDA0002819261100000152
is the additive Gaussian noise with the mean value of 0 and Sfσ is a normal distribution (or called Gaussian distribution) of standard deviation, where SfThe sensitivity of the query function f is defined as the maximum difference in results that can be obtained when inputting adjacent data d and d' into the function f.
We take the first noise gradient term in the form of cross entropy CE as an example to illustrate the addition of the second noise. Assume that for layer l, in accordance with the Gaussian mechanism, in the first noise gradient term, in particular the disturbance gradient
Figure BDA0002819261100000153
At an addition amplitude of SappIs e.g. noise of(l)Then, the result obtained by the disturbance recovery performed by the server side is:
Figure BDA0002819261100000154
i.e. the noise amplitude, variance, multiplied by the noise matrix e(l)The random disturbance matrix R is used on the server side as second noise added to the first noise gradient term(l)After disturbance recovery, final noise is generated
Figure BDA0002819261100000155
The final noise is the second noise and the corresponding random perturbation matrix R(l)Combinations of (a) and (b). In order to make the final noise follow the Gaussian distribution
Figure BDA0002819261100000156
Or to provide greater privacy protection, the following conditions need to be met.
First, the upper bound of sensitivity is SappDue to the factThis is required to make
Figure BDA0002819261100000157
I.e. the noise amplitude is not smaller than the (infinite order) norm of the gradient.
In addition, it should be such that,
Figure BDA0002819261100000158
satisfying a Gaussian distribution, i.e. a random disturbance matrix R(l)Matrix element of (1) and noise matrix ∈(l)Combinations of matrix elements in (1)
Figure BDA0002819261100000159
Satisfying a gaussian distribution. For this purpose, the decomposition principle of the gaussian distribution needs to be utilized.
It can be mathematically proven (see, for example, IosifPinelis.2018. the exp-normal distribution is infinite differential. CoRR (2018)), that, assuming that Z is a Gaussian distributed variable and k is a natural number, Z can be decomposed into k independent identically distributed variables W1,...,WkThe product of (a):
Figure BDA0002819261100000161
specifically, the method comprises the following steps:
Figure BDA0002819261100000162
wherein, G1/k,0,G1/k,1,.. are independent identically distributed variables, each having a Gamma distribution (1/k, 1) with a shape parameter of 1/k and a scale parameter of 1.
Hereinafter, the case of decomposing a gaussian distribution into k variables is referred to as k-ary decomposition, and the k-ary decomposition data distribution in which the distribution of variables is decomposed, referred to as gaussian distribution, is denoted by dn (k).
Can solve the problems by using the principle
Figure BDA0002819261100000163
Gauss ofThe problem of distribution.
In one embodiment, the server generates the random perturbation matrix R independently and randomly for each network layer in the aforementioned step 21(l)Each element of (1)
Figure BDA0002819261100000164
In such a case, generating the element may be further restricted
Figure BDA0002819261100000165
Such that each element conforms to a gaussian distributed binary decomposition data distribution, namely DN (2); meanwhile, the noise matrix epsilon generated by the first terminal(l)Each element in (1)
Figure BDA0002819261100000166
DN (2) should also be complied with. Thus, combinations thereof
Figure BDA0002819261100000167
A gaussian distribution can be satisfied.
In another embodiment, in the aforementioned step 21, the server generates a corresponding random perturbation matrix based on a combination of random vectors of the respective layers. For example, when the server adopts the formula (2), the random disturbance matrix R is determined(l)When it is, then
Figure BDA0002819261100000168
Is represented as follows:
Figure BDA0002819261100000169
it can be seen that the matrix R is perturbed randomly(l)The elements in (1) are derived from different combinations of vector elements in the random vector, and may provide contradictory requirements for the vector elements in the random vector in order to meet the requirement of gaussian distribution. For example, for an intermediate layer of 2. ltoreq. l.ltoreq.L-1,
Figure BDA0002819261100000171
decomposed into a product of 3 elements, each of which should correspond to DN (3). Then, for the network layer l, when it is taken as the current layer, each vector element in the random vector is necessarily required
Figure BDA0002819261100000172
DN (3) is satisfied; when analyzing layer l +1, with layer l as the previous layer, it is required
Figure BDA0002819261100000173
DN (3) is complied with. This is the vector element
Figure BDA0002819261100000174
The data distribution of (a) puts conflicting requirements.
To address such conflicting requirements, in one embodiment, the original neural network is expanded. Fig. 4 shows a schematic diagram of an extended neural network. Assuming that the original neural network comprises N actual network layers, inserting transition network layers between adjacent actual network layers to form N-1 transition network layers. The actual network layer is shown in fig. 4 as a bold solid frame and the transition network layer is shown as a dashed box. Thus, in an extended L-layer neural network, odd layers are actual network layers, and even layers are transition network layers.
The actual network layer has the parameter matrix W to be trained(l)The transition network layer has the identity matrix I as a parameter matrix thereof, and the parameter matrix is fixed and does not need training, and only plays a role of transmitting the output of the last actual network layer to the next actual network layer as it is. Therefore, the transition network layer has no influence on the processing process of the actual sample data, and is only used for assisting in generating the required random vector.
Thus, based on the extended neural network, when the server determines the random disturbance matrix according to the random vector in the manner of formula (2), the elements in the random vector of layer l
Figure BDA0002819261100000175
The following data distribution should be followed:
Figure BDA0002819261100000176
that is, if the network layer l is an intermediate layer (odd layer) in the actual network layer, each vector element in its random vector coincides with DN (3) which is a ternary decomposition data distribution of gaussian distribution, and if the network layer l is an intermediate layer in the transition network layer (even layer), the reciprocal of each vector element in its random vector coincides with DN (3). And the last transition network layer (L-1 layer) is provided with the reciprocal of the vector element in the random vector and accords with DN (2). In addition, for the input layer, the vector elements of its random vector conform to the gaussian distributed binary decomposition data distribution DN (2).
When an extended neural network is employed, in one embodiment, the server generates random vectors for all network layers according to the data distribution shown in equation (18), but determines the random perturbation matrix R only for the actual network layers(l)And correspondingly calculating a perturbed encryption parameter matrix
Figure BDA0002819261100000181
Accordingly, the server sends only the disturbing encryption parameter matrix of the actual network layer to the first terminal in step 23.
In another embodiment, the server may also determine a random perturbation matrix for all network layers, calculate a perturbation encryption parameter matrix, and send the perturbation encryption parameter matrix of the full network layer to the first terminal as a perturbation encryption model.
Accordingly, if the first terminal receives the actual network layer, it performs normal neural network forward propagation and backward propagation in step 24. If the first terminal receives the full number of network layers, the network layers of the even layers may be omitted and only the actual network layers are forward propagated and backward propagated in step 24.
And the noise matrix e for layer l is determined in step 25(l)Then, the matrix elements are made to satisfy the following constraints:
Figure BDA0002819261100000182
that is, for the intermediate layer in the actual network layer (odd layer), the respective matrix elements are made to satisfy a ternary decomposition data distribution DN (3) of gaussian distribution; for the input layer and the output layer, the respective matrix elements are made to satisfy a binary decomposition data distribution DN (2) of gaussian distribution.
Combining the above equations (17), (18) and (19) can prove that
Figure BDA0002819261100000183
Satisfying a gaussian distribution.
In particular, for the input layer, since
Figure BDA0002819261100000184
And
Figure BDA0002819261100000185
DN (2) is satisfied, and the product of DN conforms to Gaussian distribution;
for intermediate odd layers, i.e. intermediate layers in the actual network layer,
Figure BDA0002819261100000191
and
Figure BDA0002819261100000192
DN (3) is satisfied, and the product of DN conforms to Gaussian distribution;
for the output layer or layers, the number of layers,
Figure BDA0002819261100000193
and
Figure BDA0002819261100000194
DN (2) is satisfied, and the product conforms to Gaussian distribution.
Next, in step 26, the first terminal sends the server the confusion gradient term superimposed with the second noise. For example, in the case of cross-entropy loss, noisy disturbance gradients will be superimposed
Figure BDA0002819261100000195
And the disturbance term psi(l)And beta(l)And sending the data to a server. In the case of MSE loss, noisy disturbance gradients will be superimposed
Figure BDA0002819261100000196
And a disturbance term σ(l)And beta(l)And sending the data to a server.
Then, in step 27, the server uses the random perturbation matrix R corresponding to each network layer(l)And restoring the confusion gradient items of the corresponding network layers to obtain restoration gradients of the plurality of network layers.
For example, in the case of cross-entropy loss, the server may utilize the matrix R according to equation (14)(l)And (6) performing recovery processing. In the case of a loss of MSE, a similar recovery process may be performed according to equation (8). The recovery process results in the superposition of the final noise on the true gradient
Figure BDA0002819261100000197
The result of (1). Since it has already ensured
Figure BDA0002819261100000198
And Gaussian distribution is satisfied, so that the final noise can meet the requirement of differential privacy.
Then, in step 28, the server may aggregate the gradient recovery results obtained from the terminals, and update the current parameter matrix of each network layer according to the aggregation result, thereby implementing an iterative update of the model parameters.
It can be seen from reviewing the above processes that, in the model iterative update process, the server performs perturbation encryption on the parameters of each network layer and then issues the parameters to the terminal. The terminal can directly process the sample data based on the disturbance encryption model to obtain the disturbance gradient. The disturbance encryption of the model parameters ensures the safety of the model parameters on one hand, and the terminal can directly process samples without decryption by the disturbance encryption mode, thereby greatly saving the computing resources of the terminal. Such an approach is particularly advantageous for small terminals with limited computing power. In addition, after the terminal obtains the disturbance gradient, noise is added to the terminal, and the final noise after disturbance recovery conforms to Gaussian distribution, so that the requirement of differential privacy is met. Therefore, the server and other parties can not obtain the true value of the gradient, and the safety of the private data is further ensured.
According to an embodiment of another aspect, an apparatus for jointly training a business model based on privacy protection is further provided, and the apparatus is deployed in a server, and the server may be implemented as any device or device cluster having computing and processing capabilities. FIG. 5 shows a schematic diagram of a training apparatus deployed in a server, according to one embodiment. As shown in fig. 5, the training apparatus 500 includes:
a disturbance matrix determination unit 51 configured to determine, for a plurality of network layers in the neural network, corresponding random disturbance matrices;
the disturbance encryption unit 52 is configured to perform disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;
a sending unit 53, configured to send a disturbed encryption model to a plurality of terminals, where the disturbed encryption model includes disturbed encryption parameter matrices corresponding to the plurality of network layers;
a receiving unit 54 configured to receive confusion gradient terms respectively corresponding to the plurality of network layers from a first terminal of any of the plurality of terminals, where the confusion gradient terms are obtained by superimposing a second noise on a first noise gradient term, where the first noise gradient term is obtained by processing a first sample set local to the first terminal using the perturbation encryption model, and a combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies a gaussian distribution;
a disturbance recovery unit 55, configured to perform recovery processing on the confusion gradient items of the corresponding network layers by using the random disturbance matrices corresponding to the plurality of network layers, so as to obtain gradient recovery results of the plurality of network layers;
an aggregation updating unit 56 configured to aggregate the gradient restoration results corresponding to the plurality of terminals, and update the current parameter matrices of the plurality of network layers according to the aggregation result.
According to one embodiment, the disturbance matrix determination unit 51 comprises (not shown): a random vector determination module configured to determine, for each network layer of the neural network, a corresponding random vector, dimensions of which are the same as the number of neurons in the corresponding network layer; and the matrix determination module is configured to determine a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of an adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.
In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; in such a case, the random vector determination module is specifically configured to: determining a first random vector aiming at each middle layer in an actual network layer; determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.
Further, in an example, each vector element in the first random vector conforms to a ternary decomposition data distribution of a gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer; in such a case, the matrix determination module is specifically configured to: and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.
In one embodiment, the random vector determination module is further configured to: determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution; and, the matrix determination module is further configured to: and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.
In one embodiment, the random vector determination module is further configured to determine, for the last transition network layer, a last second random vector, wherein reciprocals of respective vector elements conform to a binary decomposition data distribution of a gaussian distribution; and, the matrix determination module is further configured to: and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.
According to one embodiment, the aforementioned plurality of network layers are the N actual network layers to be trained.
According to one embodiment, the perturbation encryption unit 52 is specifically configured to: for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer; and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.
In various embodiments, the business model is used to predict business objects, the business objects including one of: user, merchant, transaction, image, text, audio.
According to an embodiment of another aspect, an apparatus for jointly training a business model based on privacy protection is also provided, and the apparatus is deployed in a first terminal, and the first terminal may be any type of terminal computing device. Fig. 6 shows a schematic diagram of a training apparatus deployed in a terminal according to one embodiment. As shown in fig. 6, the training apparatus 600 includes:
a receiving unit 61, configured to receive a perturbation encryption model from a server, where the perturbation encryption model includes perturbation encryption parameter matrices corresponding to a plurality of network layers in the neural network, and the perturbation encryption parameter matrices are obtained by performing perturbation processing on current parameter matrices of the network layers by using random perturbation matrices of the corresponding network layers;
a gradient obtaining unit 62, configured to process a first sample set local to the first terminal by using the perturbed encryption model, to obtain a first noise gradient term for each of the multiple network layers;
a noise adding unit 63 configured to superimpose a second noise on the first noise gradient term to obtain a confusion gradient term for each network layer; wherein a combination result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;
a sending unit 64, configured to send the confusion gradient items to the server, so that the server recovers the confusion gradient items of the corresponding network layers by using the random perturbation matrix, and aggregates the recovery gradients corresponding to the multiple terminals, thereby updating the current parameter matrices of the multiple network layers.
In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; the plurality of network layers are the N actual network layers to be trained.
According to one embodiment, the gradient acquisition unit 62 is specifically configured to: inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer; and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.
In one embodiment, the noise adding unit 63 is specifically configured to: determining a noise matrix corresponding to each network layer; and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.
Further, the determining, by the noise adding unit 63, the noise matrix corresponding to each network layer may include: for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution; for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.
Through above device, can utilize the difference privacy of the disturbance encryption of model and gradient noise to handle, protect model parameter and gradient data not to let out the secret, and then ensure privacy data safety.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (18)

1. A method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a server, comprising:
determining corresponding random disturbance matrixes for a plurality of network layers in the neural network;
carrying out disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;
sending a disturbance encryption model to a plurality of terminals, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes corresponding to the plurality of network layers;
receiving confusion gradient items respectively corresponding to the plurality of network layers from a first terminal in any of the plurality of terminals, wherein the confusion gradient items are obtained by superposing second noise on first noise gradient items, the first noise gradient items are obtained by processing a first sample set local to the first terminal by using the perturbation encryption model, and the combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies Gaussian distribution;
restoring the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the network layers to obtain gradient restoration results of the network layers;
and aggregating the gradient recovery results corresponding to the plurality of terminals, and updating the current parameter matrixes of the plurality of network layers according to the aggregation result.
2. The method of claim 1, wherein determining, for a plurality of network layers in the neural network, corresponding random perturbation matrices comprises:
for each network layer of the neural network, determining a corresponding random vector, wherein the dimensionality of the random vector is the same as the number of neurons in the corresponding network layer;
and determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of the adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.
3. The method of claim 2, wherein the neural network comprises N actual network layers to be trained, and N-1 transition network layers interposed between adjacent actual network layers, the transition network layers having a fixed identity matrix as their parameter matrices;
for each network layer of the neural network, determining a corresponding random vector, comprising:
determining a first random vector aiming at each middle layer in an actual network layer;
determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.
4. The method of claim 3, wherein each vector element in the first random vector conforms to a ternary decomposition data distribution of a Gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer;
determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of an adjacent network layer, wherein the random disturbance matrix comprises:
and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.
5. The method of claim 3, wherein determining, for each network layer of the neural network, a corresponding random vector, further comprises:
determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution;
determining, for a plurality of network layers in the neural network, corresponding random perturbation matrices, further comprising:
and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.
6. The method of claim 3, wherein determining a second random vector for each transition network layer comprises: determining a last second random vector aiming at a last transition network layer, wherein the reciprocal of each vector element accords with the binary decomposition data distribution of Gaussian distribution;
determining, for a plurality of network layers in the neural network, corresponding random perturbation matrices, further comprising:
and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.
7. The method of claim 3, wherein the plurality of network layers are the N actual network layers to be trained.
8.The method according to claim 7, wherein the perturbing the current parameter matrix of the corresponding network layer by using the random perturbation matrix to obtain the perturbed encryption parameter matrix of the network layer comprises:
for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer;
and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.
9. The method of claim 1, wherein the business model is used to predict business objects, the business objects comprising one of: user, merchant, transaction, image, text, audio.
10. A method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a first terminal and comprising:
receiving a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;
processing a first sample set local to the first terminal by using the disturbance encryption model to obtain a first noise gradient item aiming at each network layer in the plurality of network layers;
superposing second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;
and sending the confusion gradient item to the server, recovering the confusion gradient item of the corresponding network layer by using the random disturbance matrix, and aggregating the recovery gradients corresponding to a plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.
11. The method of claim 10, wherein the neural network comprises N actual network layers to be trained, and N-1 transition network layers interposed between adjacent actual network layers, the transition network layers having a fixed identity matrix as their parameter matrices;
the plurality of network layers are the N actual network layers to be trained.
12. The method of claim 10, wherein processing a first set of samples local to the first terminal using the perturbed cipher model to obtain a first noise gradient term for each of the plurality of network layers comprises:
inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer;
and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.
13. The method of claim 10, wherein superimposing a second noise on the first noise gradient term comprises:
determining a noise matrix corresponding to each network layer;
and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.
14. The method of claim 13, wherein determining the noise matrix for each network layer comprises:
for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution;
for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.
15. An apparatus for jointly training a business model based on privacy protection, wherein the business model is implemented by a neural network, and the apparatus is deployed in a server, and comprises:
a disturbance matrix determination unit configured to determine, for a plurality of network layers in the neural network, corresponding random disturbance matrices;
the disturbance encryption unit is configured to perform disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;
a sending unit configured to send a disturbed encryption model to a plurality of terminals, where the disturbed encryption model includes disturbed encryption parameter matrices corresponding to the plurality of network layers;
a receiving unit, configured to receive confusion gradient terms respectively corresponding to the plurality of network layers from a first terminal of any of the plurality of terminals, where the confusion gradient terms are obtained by superimposing a second noise on a first noise gradient term, where the first noise gradient term is obtained by processing a first sample set local to the first terminal using the perturbation encryption model, and a combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies a gaussian distribution;
the disturbance recovery unit is configured to perform recovery processing on the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the network layers to obtain gradient recovery results of the network layers;
and the aggregation updating unit is configured to aggregate the gradient recovery results corresponding to the plurality of terminals, and update the current parameter matrixes of the plurality of network layers according to the aggregation result.
16. An apparatus for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the apparatus being deployed in a first terminal, comprising:
the receiving unit is configured to receive a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;
a gradient obtaining unit configured to process a first sample set local to the first terminal by using the perturbation encryption model to obtain a first noise gradient item for each of the plurality of network layers;
the noise adding unit is configured to superimpose second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;
and the sending unit is configured to send the confusion gradient items to the server, so that the server recovers the confusion gradient items of the corresponding network layers by using the random disturbance matrix, and aggregates the recovery gradients corresponding to the plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.
17. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-14.
18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-14.
CN202011409592.4A 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection Active CN112541593B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011409592.4A CN112541593B (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection
CN202210742526.1A CN114936650A (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011409592.4A CN112541593B (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210742526.1A Division CN114936650A (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection

Publications (2)

Publication Number Publication Date
CN112541593A CN112541593A (en) 2021-03-23
CN112541593B true CN112541593B (en) 2022-05-17

Family

ID=75015994

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202011409592.4A Active CN112541593B (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection
CN202210742526.1A Pending CN114936650A (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202210742526.1A Pending CN114936650A (en) 2020-12-06 2020-12-06 Method and device for jointly training business model based on privacy protection

Country Status (1)

Country Link
CN (2) CN112541593B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011587B (en) * 2021-03-24 2022-05-10 支付宝(杭州)信息技术有限公司 Privacy protection model training method and system
CN113157399B (en) * 2021-05-17 2022-11-11 北京冲量在线科技有限公司 Unsupervised joint modeling method based on ARM architecture chip
CN113435592B (en) * 2021-05-22 2023-09-22 西安电子科技大学 Neural network multiparty collaborative lossless training method and system with privacy protection
CN113255002B (en) * 2021-06-09 2022-07-15 北京航空航天大学 Federal k nearest neighbor query method for protecting multi-party privacy
CN113222480B (en) * 2021-06-11 2023-05-12 支付宝(杭州)信息技术有限公司 Training method and device for challenge sample generation model
CN113221183B (en) * 2021-06-11 2022-09-16 支付宝(杭州)信息技术有限公司 Method, device and system for realizing privacy protection of multi-party collaborative update model
CN113722760A (en) * 2021-09-06 2021-11-30 支付宝(杭州)信息技术有限公司 Privacy protection model training method and system
CN113821732B (en) * 2021-11-24 2022-02-18 阿里巴巴达摩院(杭州)科技有限公司 Item recommendation method and equipment for protecting user privacy and learning system
CN115001748B (en) * 2022-04-29 2023-11-03 北京奇艺世纪科技有限公司 Model processing method and device and computer readable storage medium
CN114782176B (en) * 2022-06-23 2022-10-25 浙江数秦科技有限公司 Credit service recommendation method based on federal learning

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214404A (en) * 2017-07-07 2019-01-15 阿里巴巴集团控股有限公司 Training sample generation method and device based on secret protection
CN109388662B (en) * 2017-08-02 2021-05-25 创新先进技术有限公司 Model training method and device based on shared data
US10657259B2 (en) * 2017-11-01 2020-05-19 International Business Machines Corporation Protecting cognitive systems from gradient based attacks through the use of deceiving gradients
KR102061345B1 (en) * 2017-12-18 2019-12-31 경희대학교 산학협력단 Method of performing encryption and decryption based on reinforced learning and client and server system performing thereof
CN108520181B (en) * 2018-03-26 2022-04-22 联想(北京)有限公司 Data model training method and device
CN109034228B (en) * 2018-07-17 2021-10-12 陕西师范大学 Image classification method based on differential privacy and hierarchical relevance propagation
CN111788584A (en) * 2018-08-21 2020-10-16 华为技术有限公司 Neural network computing method and device
US20200104678A1 (en) * 2018-09-27 2020-04-02 Google Llc Training optimizer neural networks
CN109495476B (en) * 2018-11-19 2020-11-20 中南大学 Data stream differential privacy protection method and system based on edge calculation
US11599774B2 (en) * 2019-03-29 2023-03-07 International Business Machines Corporation Training machine learning model
CN111461215B (en) * 2020-03-31 2021-06-29 支付宝(杭州)信息技术有限公司 Multi-party combined training method, device, system and equipment of business model
CN111177792B (en) * 2020-04-10 2020-06-30 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection
CN111177768A (en) * 2020-04-10 2020-05-19 支付宝(杭州)信息技术有限公司 Method and device for protecting business prediction model of data privacy joint training by two parties
CN111324911B (en) * 2020-05-15 2021-01-01 支付宝(杭州)信息技术有限公司 Privacy data protection method, system and device
CN111723404B (en) * 2020-08-21 2021-01-22 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model

Also Published As

Publication number Publication date
CN112541593A (en) 2021-03-23
CN114936650A (en) 2022-08-23

Similar Documents

Publication Publication Date Title
CN112541593B (en) Method and device for jointly training business model based on privacy protection
CN111160573B (en) Method and device for protecting business prediction model of data privacy joint training by two parties
WO2021197037A1 (en) Method and apparatus for jointly performing data processing by two parties
CN112989368B (en) Method and device for processing private data by combining multiple parties
CN111178549B (en) Method and device for protecting business prediction model of data privacy joint training by two parties
CN111177791B (en) Method and device for protecting business prediction model of data privacy joint training by two parties
CN112199702B (en) Privacy protection method, storage medium and system based on federal learning
US11222138B2 (en) Privacy-preserving machine learning in the three-server model
Zheng et al. Gan-based key secret-sharing scheme in blockchain
CN111177768A (en) Method and device for protecting business prediction model of data privacy joint training by two parties
CN113221183B (en) Method, device and system for realizing privacy protection of multi-party collaborative update model
CN111738361A (en) Joint training method and device for business model
CN112805769B (en) Secret S-type function calculation system, secret S-type function calculation device, secret S-type function calculation method, and recording medium
CN112101946B (en) Method and device for jointly training business model
CN112101531B (en) Neural network model training method, device and system based on privacy protection
CN111291411B (en) Safe video anomaly detection system and method based on convolutional neural network
CN115842627A (en) Decision tree evaluation method, device, equipment and medium based on secure multi-party computation
CN116561787A (en) Training method and device for visual image classification model and electronic equipment
Zhang et al. SecureTrain: An approximation-free and computationally efficient framework for privacy-preserved neural network training
Shukur et al. Asymmetrical novel hyperchaotic system with two exponential functions and an application to image encryption
CN115186876A (en) Method and device for protecting data privacy of two-party joint training service prediction model
Wang et al. SieveNet: decoupling activation function neural network for privacy-preserving deep learning
CN117634633A (en) Method for federal learning and federal learning system
CN114358323A (en) Third-party-based efficient Pearson coefficient calculation method in federated learning environment
CN114547684A (en) Method and device for protecting multi-party joint training tree model of private data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant