CN112541593B

CN112541593B - Method and device for jointly training business model based on privacy protection

Info

Publication number: CN112541593B
Application number: CN202011409592.4A
Authority: CN
Inventors: 熊涛; 冯岩
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-12-06
Filing date: 2020-12-06
Publication date: 2022-05-17
Anticipated expiration: 2040-12-06
Also published as: CN112541593A; CN114936650A

Abstract

According to the method, a server determines a disturbance matrix aiming at each network layer in a neural network for realizing a service model, and performs disturbance encryption on parameters of the network layer by using the disturbance matrix to obtain a disturbance encryption model, and the disturbance encryption model is distributed to each terminal. And the terminal processes the local training sample by using the disturbance encryption model to obtain a disturbance gradient. Also, the terminal superimposes noise on the disturbance gradient. By elaborately designing the distribution of the noise, the noise obtained after the disturbance matrix is recovered conforms to Gaussian distribution, so that the requirement of differential privacy is met. The server can then perform perturbation recovery and aggregation on the noise-containing gradients sent by the terminals, so as to update the parameters in the neural network model.

Description

Method and device for jointly training business model based on privacy protection

Technical Field

One or more embodiments of the present disclosure relate to the field of machine learning, and more particularly, to a model joint training method and apparatus for protecting privacy in a distributed system.

Background

The rapid development of machine learning enables various machine learning models to be applied to various business scenes. Because the prediction performance of the model depends on the abundance and availability of the training samples, in order to obtain a service prediction model with more excellent performance, training data of a plurality of platforms are generally required to be comprehensively utilized to train the model together.

Specifically, in a scenario in which data is distributed longitudinally, a plurality of platforms may have different feature data of the same batch of business objects. For example, in a merchant classification analysis scenario based on machine learning, an electronic payment platform has transaction flow data of merchants, an electronic commerce platform stores sales data of the merchants, and a banking institution has loan data of the merchants. In a scenario where data is distributed horizontally, multiple platforms may each possess the same attribute characteristics of different business objects. Such as banking institutions in different regions, each have loan data for locally registered merchants. There are of course also cases where the longitudinal and transverse distributions are combined.

Training data local to multiple platforms often contains privacy of local business objects, especially user privacy. Furthermore, a local model trained according to local training data may also risk to leak local data features. Therefore, in the scenario of a multi-party co-training model, data security and data privacy issues are a great challenge.

Therefore, it is desirable to provide an improved scheme for ensuring privacy data of each party is not leaked and data security is ensured under the condition that a plurality of parties train a business model together.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for jointly training a service model, which protect private data from being leaked and ensure data security by performing perturbation encryption on the model and adding noise to a gradient.

According to a first aspect, there is provided a method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a server and comprising:

determining corresponding random disturbance matrixes for a plurality of network layers in the neural network;

carrying out disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;

sending a disturbance encryption model to a plurality of terminals, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes corresponding to the plurality of network layers;

receiving confusion gradient items respectively corresponding to the plurality of network layers from a first terminal in any of the plurality of terminals, wherein the confusion gradient items are obtained by superposing second noise on first noise gradient items, the first noise gradient items are obtained by processing a first sample set local to the first terminal by using the perturbation encryption model, and the combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies Gaussian distribution;

restoring the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the network layers to obtain gradient restoration results of the network layers;

and aggregating the gradient recovery results corresponding to the plurality of terminals, and updating the current parameter matrixes of the plurality of network layers according to the aggregation result.

According to an embodiment, determining, for a plurality of network layers in the neural network, a corresponding random perturbation matrix specifically includes: for each network layer of the neural network, determining a corresponding random vector, wherein the dimensionality of the random vector is the same as the number of neurons in the corresponding network layer; and determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of the adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.

In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; in such a case, the determining the corresponding random vector specifically includes: determining a first random vector aiming at each middle layer in an actual network layer; determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.

Further, in an example, each vector element in the first random vector conforms to a ternary decomposition data distribution of a gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer; under such a condition, determining the random disturbance matrix corresponding to the first network layer specifically includes: and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.

In one embodiment, determining, for each network layer of the neural network, a corresponding random vector further comprises: determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution; and, the step of determining the random perturbation matrix further comprises: and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.

In one embodiment, for the last transition network layer, determining a last second random vector, wherein the reciprocal of each vector element conforms to the binary decomposition data distribution of the gaussian distribution; and the step of determining the corresponding random perturbation matrix further comprises: and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.

According to one embodiment, the aforementioned plurality of network layers are the N actual network layers to be trained.

According to an embodiment, the method for obtaining the perturbation encryption parameter matrix of the network layer by using the random perturbation matrix to perform perturbation processing on the current parameter matrix of the corresponding network layer specifically includes: for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer; and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.

In various embodiments, the business model is used to predict business objects, the business objects including one of: user, merchant, transaction, image, text, audio.

According to a second aspect, there is provided a method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a first terminal and comprising:

receiving a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;

processing a first sample set local to the first terminal by using the disturbance encryption model to obtain a first noise gradient item aiming at each network layer in the plurality of network layers;

superposing second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;

and sending the confusion gradient item to the server, recovering the confusion gradient item of the corresponding network layer by using the random disturbance matrix, and aggregating the recovery gradients corresponding to a plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.

In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; the plurality of network layers are the N actual network layers to be trained.

According to one embodiment, the first noise gradient term is obtained by: inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer; and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.

In one embodiment, superimposing a second noise on the first noise gradient term specifically includes: determining a noise matrix corresponding to each network layer; and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.

Further, determining the noise matrix corresponding to each network layer may include: for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution; for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.

According to a third aspect, there is provided an apparatus for jointly training a business model based on privacy protection, where the business model is implemented by a neural network, and the apparatus is deployed in a server, and includes:

a disturbance matrix determination unit configured to determine, for a plurality of network layers in the neural network, corresponding random disturbance matrices;

the disturbance encryption unit is configured to perform disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;

a sending unit configured to send a disturbed encryption model to a plurality of terminals, where the disturbed encryption model includes disturbed encryption parameter matrices corresponding to the plurality of network layers;

a receiving unit, configured to receive confusion gradient terms respectively corresponding to the plurality of network layers from a first terminal of any of the plurality of terminals, where the confusion gradient terms are obtained by superimposing a second noise on a first noise gradient term, where the first noise gradient term is obtained by processing a first sample set local to the first terminal using the perturbation encryption model, and a combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies a gaussian distribution;

the disturbance recovery unit is configured to recover the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the plurality of network layers to obtain gradient recovery results of the plurality of network layers;

and the aggregation updating unit is configured to aggregate the gradient recovery results corresponding to the plurality of terminals, and update the current parameter matrixes of the plurality of network layers according to the aggregation result.

According to a fourth aspect, there is provided an apparatus for jointly training a business model based on privacy protection, where the business model is implemented by a neural network, and the apparatus is deployed in a first terminal, and includes:

the receiving unit is configured to receive a disturbance encryption model from a server, wherein the disturbance encryption model comprises disturbance encryption parameter matrixes respectively corresponding to a plurality of network layers in the neural network, and the disturbance encryption parameter matrixes are obtained by performing disturbance processing on current parameter matrixes of the network layers by using random disturbance matrixes of the corresponding network layers;

a gradient obtaining unit configured to process a first sample set local to the first terminal by using the perturbation encryption model to obtain a first noise gradient item for each of the plurality of network layers;

the noise adding unit is configured to superimpose second noise on the first noise gradient term to obtain a confusion gradient term aiming at each network layer; wherein a combined result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;

and the sending unit is configured to send the confusion gradient items to the server, so that the server recovers the confusion gradient items of the corresponding network layers by using the random disturbance matrix, and aggregates the recovery gradients corresponding to the plurality of terminals, thereby updating the current parameter matrixes of the plurality of network layers.

According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.

According to the method and the device provided by the embodiment of the specification, in the iterative model updating process, the server performs disturbance encryption on the parameters of each network layer and then sends the parameters to the terminal. The terminal can directly process the sample data based on the disturbance encryption model to obtain the disturbance gradient. The disturbance encryption of the model parameters ensures the safety of the model parameters on one hand, and the terminal can directly process samples without decryption by the disturbance encryption mode, thereby greatly saving the computing resources of the terminal. In addition, after the terminal obtains the disturbance gradient, noise is added to the terminal, and the final noise after disturbance recovery conforms to Gaussian distribution, so that the requirement of differential privacy is met. Therefore, the server and other parties can not obtain the true value of the gradient, and the safety of the private data is further ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates an example scenario for business model training via federated learning;

FIG. 2 illustrates a flow diagram of a business model training method to protect privacy, according to one embodiment;

FIG. 3 shows a schematic diagram of a plurality of network layers in a neural network;

FIG. 4 shows a schematic diagram of an extended neural network;

FIG. 5 shows a schematic diagram of a training apparatus deployed in a server, according to one embodiment;

fig. 6 shows a schematic diagram of a training apparatus deployed in a terminal according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

FIG. 1 illustrates an example scenario of business model training through federated learning. In the schematic scenario of fig. 1, the distributed system includes a server and N terminals, each having a training sample for model training. The terminal may be a data platform of a large organization, such as a bank, a hospital, a payment platform, etc., or a small device, such as a personal PC, a smart phone, an IoT device, etc. The device types of the respective terminals may be the same or different.

The business model to be trained is used for predicting business objects, and the business objects can be various objects such as users, merchants, transactions, images, texts, audios and the like. For model training, each terminal has a respective service object sample set, where each sample includes feature information of a service object as a sample feature, and further includes a label corresponding to a prediction target, where the label may be a classification label or a regression value label. For example, in one specific example, the business object is a user represented by an account. Accordingly, sample characteristics may include, for example, registration duration of the account, registration information, frequency of use over a recent period of time, frequency of comments made, etc.; the tag may be a user classification tag, for example, to show the crowd to which the user belongs, or to show whether the account is an abnormal account (spam account, naval account, stolen account, etc.). In another example, the business object is a transaction. Accordingly, the sample characteristics may include, for example, transaction amount, transaction time, payment channel, transaction party attribute information, and the like. The present specification does not limit the service objects, and the situations of various service objects are not described in detail.

In a typical federated learning process, the server determines the structure and algorithms of the business model to be trained, initializes them, and then updates the model through multiple iterations. In each iteration, the server issues the current parameter W of the model to each terminal. Each terminal i trains the model with the current parameter W based on the local training sample thereof to obtain the gradient G of the model parameter_iI.e. the amount of change or update of the parameter. Then, the terminal i transmits gradient information of the model parameters to the server. And the server respectively obtains N parts of gradient information from the N terminals, aggregates the N parts of gradient information to obtain a comprehensive gradient of the parameters of the iteration model in the current round, and updates the parameters of the model according to the comprehensive gradient until a training end condition is reached. Thus, theAnd the server and the N terminals cooperatively realize the joint training of the model.

However, implementing the above federal learning procedure unprotected presents a risk of privacy disclosure. On one hand, the parameter gradient obtained based on the local sample may carry information of the local sample, and the local sample is often privacy data that needs to be protected by the data platform, for example, personal information of the user, bank flow, medical records, and the like. A malicious attacker may learn the information of the training sample from the gradient information of the interaction between the terminal and the server. Or, one terminal may reversely deduce gradient information of other terminals from parameter information issued by the server, so as to infer training sample data of the terminal. In addition, there is a privacy problem in the output of the model, that is, an attacker may guess whether a training sample contains a specific data record by querying the model during or after training.

In order to protect the privacy data of each terminal in the federal learning process, the inventor proposes that the server adopts a disturbance encryption algorithm to carry out disturbance encryption on model parameters and then distributes the model parameters to each terminal. And the terminal processes the local training sample by using the disturbance encryption model to obtain a disturbance gradient. In order to avoid that the server recovers the true gradient by using a decryption mode corresponding to the disturbance encryption algorithm, the terminal also superimposes noise on the disturbance gradient. By carefully designing the distribution of the noise, the noise after disturbance decryption conforms to Gaussian distribution, thereby meeting the requirement of differential privacy protection. In this way, the true plaintext values of the model parameters and the gradients are not exposed during the federal learning process, thereby ensuring the security of the private data.

The following describes a specific implementation of the above concept.

FIG. 2 illustrates a flow diagram of a business model training method to protect privacy, according to one embodiment. The flow chart relates to a server and a first terminal, the server may be implemented by any apparatus, device, cluster of devices having computing and processing capabilities. The first terminal is any one of a plurality of terminals participating in federal learning and may have any device type. Only one terminal is shown in this figure for simplicity and clarity. However, it should be understood that multiple terminals in the federal learning process can each implement the training process in the manner of the first terminal.

The business model can be implemented as a neural network, such as a multi-layer fully-connected deep neural network DNN (or called multi-layer perceptron MLP), a convolutional neural network CNN, and so on. The following is described in detail with reference to an example of a multi-layered perceptron MLP.

FIG. 3 shows a schematic diagram of a plurality of network layers in a neural network. Assume that the neural network contains L network layers, where an arbitrary ith network layer is denoted as layer L. When L is 1, it is indicated as an input layer, when L is L, it is indicated as an output layer, and in the rest cases, it is an intermediate layer. In the case of MLP, each network layer except the output layer, in which each neuron is connected to each neuron in the next network layer. Suppose layer l has n_lA neuron whose anterior layer l-1 has n_l-1Each neuron has n from layer l-1 to layer l_l-1×n_lAnd each connecting edge is provided with a corresponding weight which is used as a model parameter to be trained in the model. Thus, layer l has n_l-1×n_lA model parameter, constituting n_l-1×n_lParameter matrix of dimension, denoted as W^(l)。

For neural networks that employ Relu as a neuron activation function, the output of each network layer l can be expressed as y^(l)And is calculated by the following formula:

in the model training process, the parameter matrix W of each layer needs to be iterated for multiple times^(l)And (6) updating. Fig. 2 shows a process in which the update is performed on any one iteration. The specific implementation is described below.

First, at step 21, the server determines corresponding random perturbation matrices for a plurality of network layers in the neural network. For the network layer l, the server determines a corresponding random disturbance matrix R^(l)The random disturbance momentArray R^(l)As a key for corresponding parameter matrix W^(l)And carrying out disturbance encryption. In general, perturbation encryption is to mask the true value of an encrypted object by means of perturbation transformation. The corresponding inverse transform can be subsequently utilized to eliminate the effect of the perturbation transform. Here, in order to perform scramble encryption on the parameter matrix, a random scramble matrix R is generally required^(l)And a parameter matrix W^(l)Have the same dimensions.

According to the specific mode of disturbance transformation, different algorithms can be adopted to determine the disturbance matrix. In one embodiment, the corresponding random perturbation matrix may be independently and randomly generated for each network layer. Specifically, for the network layer l, a corresponding number of matrix elements may be randomly generated according to the dimension of the parameter matrix of the layer, and the matrix elements are used as the corresponding random disturbance matrix R^(l)。

In another embodiment, corresponding random vectors may be determined for each network layer of the neural network, and the perturbation matrix for each network layer may be further generated based on the random vectors for each network layer. Specifically, for the network layer l, a random vector r may be generated correspondingly^(l)So that the random vector r^(l)Dimension of (d) and number n of neurons in layer l_lThe same is true. In this way, the elements in the random vectors of the adjacent network layers are combined to obtain the corresponding random disturbance matrix.

For example, assume that network layer l has n_lRandom vector r of dimension^(l)The previous network layer l-1 has n_l-1Random vector r of dimension^(l-1)Then can be combined by r^(l)And r^(l-1)To obtain n_l-1×n_lAnd the random disturbance matrix of the dimension is used as the random disturbance matrix of the layer l. For the input layer and the output layer, special setting can be carried out to generate corresponding random disturbance matrixes.

In one specific example, for any network layer l, a random perturbation matrix R is generated based on random vectors in the following manner^(l)：

That is, for an input layer l of 1, a random vector r will be assigned⁽¹⁾The elements in (1) are used as matrix elements to obtain a random disturbance matrix R corresponding to the input layer⁽¹⁾；

For the intermediate layer 2<＝l<The random vector r corresponding to the intermediate layer L is defined as L-1^(l)Respective vector element in (2)

Random vector r corresponding to previous network layer l-1^(l-1)Respective vector element in (2)

Respectively combining the reciprocals of the network layers l, and taking the combined result as a random disturbance matrix R corresponding to the network layer l^(l)Each matrix element in (1)

For the output layer L ═ L, the random vector r corresponding to the previous layer L-1 is used^(L-1)The reciprocal of each element is used as a matrix element to obtain a random disturbance matrix R corresponding to the output layer^(L)。

The method for determining the random disturbance matrix with a good effect is exemplified above. In other examples, other random vector element combination methods may also be used, for example, for the middle layer, elements in random vectors of two adjacent layers are multiplied and combined to obtain a corresponding random disturbance matrix.

In one embodiment, a special random vector may be generated for the output layer L ═ L, so that an additional perturbation matrix is constructed on the basis of the above random perturbation matrix for subsequent perturbation encryption.

In particular, for a group comprising n_LAn output layer of individual neurons, n can be generated_LRandom vectors of dimensions gamma and r_aWherein r is_aWith elements differing in pairs, the elements in gamma being divisibleDivided into several groups which do not overlap with each other, the elements in each group being identical, using gamma_iIndicating the ith group therein. Thus, it can be based on the random vectors γ and r described above_aConstructing an additional perturbation matrix R for the output layer^a：

It is to be understood that the random perturbation matrix of each network layer may be predetermined before each iteration and remains unchanged in multiple iterations; or may be generated temporarily at each iteration. In order to better guarantee the privacy security effect, the random disturbance matrix is preferably regenerated in each iteration.

On the basis of determining the random disturbance matrix of each network layer, the random disturbance matrix can be used as a key for disturbance encryption to perform disturbance encryption processing on the parameter matrix of each network layer. Then, in step 22, the random perturbation matrix is used to perform perturbation processing on the current parameter matrix of the corresponding network layer, so as to obtain a perturbation encryption parameter matrix of the network layer.

According to one embodiment, in this step, for each network layer l, the random perturbation matrix R corresponding to the network layer l is used^(l)With its current parameter matrix W^(l)Performing corresponding position element combination, such as multiplication combination, addition combination, etc., and using the combination result matrix as the disturbance encryption parameter matrix of the network layer

In the following, this variable is represented by the "^" symbol above the variable, which is the perturbed variable.

In one embodiment, special settings are made for the output layer. For the output layer L, the random disturbance matrix R corresponding to the output layer L is used^(L)With its current parameter matrix W^(L)Corresponding position element combination is carried out, and an additional disturbance matrix R for the output layer is superposed^aAnd obtaining a disturbance encryption parameter matrix of the output layer.

In a specific example, the process of performing perturbation encryption processing on the parameter matrix of each layer by using the random perturbation matrix may be represented as:

wherein the operation sign

The Hadamard operation represents the multiplication of corresponding position elements.

Therefore, according to the example of the formula (4), for the first L-1 network layers, the parameter matrixes of the network layers are multiplied by the random disturbance matrix thereof in a bit-by-bit manner to serve as a disturbance encryption parameter matrix; for the output layer L, an additional disturbance matrix R is superposed on the basis of the bitwise multiplication combined matrix^aAnd obtaining a disturbance encryption parameter matrix.

It should be noted that formula (4) shows a perturbation encryption method with a better effect. In other examples, other perturbation encryption algorithms may be used.

Thus, through the process, the disturbance encryption parameter matrix of each network layer is obtained

Therefore, the disturbance encryption of the neural network is realized.

Then, in step 23, the server sends a perturbation encryption model to the plurality of terminals, wherein the perturbation encryption model comprises a plurality of perturbation encryption parameter matrixes corresponding to the network layers. The plurality of terminals may be terminals participating in the iterative training of the present round, and the first terminal may be any one of the terminals.

The following describes a processing procedure after the terminal side receives the perturbation encryption model, taking the first terminal as an example.

As described above, the first terminal receives the above-described disturbed encryption model through the above-described step 23. The first terminal then processes its local first sample set using the perturbed cipher model to obtain a first noise gradient term for each of the plurality of network layers at step 24.

Specifically, the first terminal may input the feature data of each sample in the first sample set into the perturbation encryption model, and obtain perturbation output of each network layer l through forward propagation

Continuing with the MLP example using Relu function shown in equation (1), the perturbation output of each network layer l

Can be expressed as:

under the condition that the disturbance encryption parameter matrix is obtained by adopting the formula (4), comparing the real output of each network layer in the formula (1) with the disturbance output in the formula (5), it can be deduced that the following relation is satisfied between the real output and the disturbance output:

in the formula (6), the first and second groups,

gamma and r_aIs a special random vector generated for the output layer, utilized in the aforementioned equation (3).

And on the basis of obtaining the disturbance output, performing back propagation by using the label data of each sample and a preset loss function according to the disturbance output of each network layer to obtain a first noise gradient term related to the gradient corresponding to each network layer.

In one embodiment, the loss function employs a mean square error mse (mean Squared error) loss function. At this time, disturbance loss

As follows:

wherein,

representing the tag data.

From this, the true gradient can be derived

And a disturbance gradient

The relationship between:

wherein: v ═ r^Tr and:

(9) α in (10) is the same as defined in formula (6):

therefore, in the case of an MSE loss function, in order to allow the server to perform disturbance recovery on the gradient, the first noise gradient term determined by the first terminal may include a disturbance gradient

Disturbance term σ shown in equation (9)^(l)And the disturbance term β shown in equation (10)^(l)。

Similarly, but slightly different, in the case where the loss function takes the form of cross-entropy CE loss, it can be shown that the following relationship is satisfied between true and perturbed gradients:

wherein,

β^(l)the other parameter terms are the same as defined by equation (8), as defined by equation (10).

Therefore, in the case of the cross-entropy CE loss function, in order to enable the server to perform disturbance recovery on the gradient, the first noise gradient term determined by the first terminal may include a disturbance gradient

Disturbance term ψ shown in equation (12)^(l)And the disturbance term β shown in equation (10)^(l)。

However, as previously mentioned, it is not desirable for the server to be able to fully recover the true value of the resulting gradient in order to protect data privacy. Therefore, next, in step 25, a second noise is superimposed on the above first noise gradient term, to obtain an aliasing gradient term for each network layer; and in order to meet the requirement of differential privacy, the combination result of the second noise and the random disturbance matrix of the corresponding network layer is enabled to meet Gaussian distribution.

Here, the gaussian mechanism of differential privacy is briefly introduced. Gaussian noise algorithm meeting difference privacy

Having the form:

where f is the query function, d is the input data,

is the additive Gaussian noise with the mean value of 0 and S_fσ is a normal distribution (or called Gaussian distribution) of standard deviation, where S_fThe sensitivity of the query function f is defined as the maximum difference in results that can be obtained when inputting adjacent data d and d' into the function f.

We take the first noise gradient term in the form of cross entropy CE as an example to illustrate the addition of the second noise. Assume that for layer l, in accordance with the Gaussian mechanism, in the first noise gradient term, in particular the disturbance gradient

At an addition amplitude of S_appIs e.g. noise of^(l)Then, the result obtained by the disturbance recovery performed by the server side is:

i.e. the noise amplitude, variance, multiplied by the noise matrix e^(l)The random disturbance matrix R is used on the server side as second noise added to the first noise gradient term^(l)After disturbance recovery, final noise is generated

The final noise is the second noise and the corresponding random perturbation matrix R^(l)Combinations of (a) and (b). In order to make the final noise follow the Gaussian distribution

Or to provide greater privacy protection, the following conditions need to be met.

First, the upper bound of sensitivity is S_appDue to the factThis is required to make

I.e. the noise amplitude is not smaller than the (infinite order) norm of the gradient.

In addition, it should be such that,

satisfying a Gaussian distribution, i.e. a random disturbance matrix R^(l)Matrix element of (1) and noise matrix ∈^(l)Combinations of matrix elements in (1)

Satisfying a gaussian distribution. For this purpose, the decomposition principle of the gaussian distribution needs to be utilized.

It can be mathematically proven (see, for example, IosifPinelis.2018. the exp-normal distribution is infinite differential. CoRR (2018)), that, assuming that Z is a Gaussian distributed variable and k is a natural number, Z can be decomposed into k independent identically distributed variables W₁，...，W_kThe product of (a):

specifically, the method comprises the following steps:

wherein, G_1/k，0，G_1/k，1,.. are independent identically distributed variables, each having a Gamma distribution (1/k, 1) with a shape parameter of 1/k and a scale parameter of 1.

Hereinafter, the case of decomposing a gaussian distribution into k variables is referred to as k-ary decomposition, and the k-ary decomposition data distribution in which the distribution of variables is decomposed, referred to as gaussian distribution, is denoted by dn (k).

Can solve the problems by using the principle

Gauss ofThe problem of distribution.

In one embodiment, the server generates the random perturbation matrix R independently and randomly for each network layer in the aforementioned step 21^(l)Each element of (1)

In such a case, generating the element may be further restricted

Such that each element conforms to a gaussian distributed binary decomposition data distribution, namely DN (2); meanwhile, the noise matrix epsilon generated by the first terminal^(l)Each element in (1)

DN (2) should also be complied with. Thus, combinations thereof

A gaussian distribution can be satisfied.

In another embodiment, in the aforementioned step 21, the server generates a corresponding random perturbation matrix based on a combination of random vectors of the respective layers. For example, when the server adopts the formula (2), the random disturbance matrix R is determined^(l)When it is, then

Is represented as follows:

it can be seen that the matrix R is perturbed randomly^(l)The elements in (1) are derived from different combinations of vector elements in the random vector, and may provide contradictory requirements for the vector elements in the random vector in order to meet the requirement of gaussian distribution. For example, for an intermediate layer of 2. ltoreq. l.ltoreq.L-1,

decomposed into a product of 3 elements, each of which should correspond to DN (3). Then, for the network layer l, when it is taken as the current layer, each vector element in the random vector is necessarily required

DN (3) is satisfied; when analyzing layer l +1, with layer l as the previous layer, it is required

DN (3) is complied with. This is the vector element

The data distribution of (a) puts conflicting requirements.

To address such conflicting requirements, in one embodiment, the original neural network is expanded. Fig. 4 shows a schematic diagram of an extended neural network. Assuming that the original neural network comprises N actual network layers, inserting transition network layers between adjacent actual network layers to form N-1 transition network layers. The actual network layer is shown in fig. 4 as a bold solid frame and the transition network layer is shown as a dashed box. Thus, in an extended L-layer neural network, odd layers are actual network layers, and even layers are transition network layers.

The actual network layer has the parameter matrix W to be trained^(l)The transition network layer has the identity matrix I as a parameter matrix thereof, and the parameter matrix is fixed and does not need training, and only plays a role of transmitting the output of the last actual network layer to the next actual network layer as it is. Therefore, the transition network layer has no influence on the processing process of the actual sample data, and is only used for assisting in generating the required random vector.

Thus, based on the extended neural network, when the server determines the random disturbance matrix according to the random vector in the manner of formula (2), the elements in the random vector of layer l

The following data distribution should be followed:

that is, if the network layer l is an intermediate layer (odd layer) in the actual network layer, each vector element in its random vector coincides with DN (3) which is a ternary decomposition data distribution of gaussian distribution, and if the network layer l is an intermediate layer in the transition network layer (even layer), the reciprocal of each vector element in its random vector coincides with DN (3). And the last transition network layer (L-1 layer) is provided with the reciprocal of the vector element in the random vector and accords with DN (2). In addition, for the input layer, the vector elements of its random vector conform to the gaussian distributed binary decomposition data distribution DN (2).

When an extended neural network is employed, in one embodiment, the server generates random vectors for all network layers according to the data distribution shown in equation (18), but determines the random perturbation matrix R only for the actual network layers^(l)And correspondingly calculating a perturbed encryption parameter matrix

Accordingly, the server sends only the disturbing encryption parameter matrix of the actual network layer to the first terminal in step 23.

In another embodiment, the server may also determine a random perturbation matrix for all network layers, calculate a perturbation encryption parameter matrix, and send the perturbation encryption parameter matrix of the full network layer to the first terminal as a perturbation encryption model.

Accordingly, if the first terminal receives the actual network layer, it performs normal neural network forward propagation and backward propagation in step 24. If the first terminal receives the full number of network layers, the network layers of the even layers may be omitted and only the actual network layers are forward propagated and backward propagated in step 24.

And the noise matrix e for layer l is determined in step 25^(l)Then, the matrix elements are made to satisfy the following constraints:

that is, for the intermediate layer in the actual network layer (odd layer), the respective matrix elements are made to satisfy a ternary decomposition data distribution DN (3) of gaussian distribution; for the input layer and the output layer, the respective matrix elements are made to satisfy a binary decomposition data distribution DN (2) of gaussian distribution.

Combining the above equations (17), (18) and (19) can prove that

Satisfying a gaussian distribution.

In particular, for the input layer, since

And

DN (2) is satisfied, and the product of DN conforms to Gaussian distribution;

for intermediate odd layers, i.e. intermediate layers in the actual network layer,

and

DN (3) is satisfied, and the product of DN conforms to Gaussian distribution;

for the output layer or layers, the number of layers,

and

DN (2) is satisfied, and the product conforms to Gaussian distribution.

Next, in step 26, the first terminal sends the server the confusion gradient term superimposed with the second noise. For example, in the case of cross-entropy loss, noisy disturbance gradients will be superimposed

And the disturbance term psi^(l)And beta^(l)And sending the data to a server. In the case of MSE loss, noisy disturbance gradients will be superimposed

And a disturbance term σ^(l)And beta^(l)And sending the data to a server.

Then, in step 27, the server uses the random perturbation matrix R corresponding to each network layer^(l)And restoring the confusion gradient items of the corresponding network layers to obtain restoration gradients of the plurality of network layers.

For example, in the case of cross-entropy loss, the server may utilize the matrix R according to equation (14)^(l)And (6) performing recovery processing. In the case of a loss of MSE, a similar recovery process may be performed according to equation (8). The recovery process results in the superposition of the final noise on the true gradient

The result of (1). Since it has already ensured

And Gaussian distribution is satisfied, so that the final noise can meet the requirement of differential privacy.

Then, in step 28, the server may aggregate the gradient recovery results obtained from the terminals, and update the current parameter matrix of each network layer according to the aggregation result, thereby implementing an iterative update of the model parameters.

It can be seen from reviewing the above processes that, in the model iterative update process, the server performs perturbation encryption on the parameters of each network layer and then issues the parameters to the terminal. The terminal can directly process the sample data based on the disturbance encryption model to obtain the disturbance gradient. The disturbance encryption of the model parameters ensures the safety of the model parameters on one hand, and the terminal can directly process samples without decryption by the disturbance encryption mode, thereby greatly saving the computing resources of the terminal. Such an approach is particularly advantageous for small terminals with limited computing power. In addition, after the terminal obtains the disturbance gradient, noise is added to the terminal, and the final noise after disturbance recovery conforms to Gaussian distribution, so that the requirement of differential privacy is met. Therefore, the server and other parties can not obtain the true value of the gradient, and the safety of the private data is further ensured.

According to an embodiment of another aspect, an apparatus for jointly training a business model based on privacy protection is further provided, and the apparatus is deployed in a server, and the server may be implemented as any device or device cluster having computing and processing capabilities. FIG. 5 shows a schematic diagram of a training apparatus deployed in a server, according to one embodiment. As shown in fig. 5, the training apparatus 500 includes:

a disturbance matrix determination unit 51 configured to determine, for a plurality of network layers in the neural network, corresponding random disturbance matrices;

the disturbance encryption unit 52 is configured to perform disturbance processing on the current parameter matrix of the corresponding network layer by using the random disturbance matrix to obtain a disturbance encryption parameter matrix of the network layer;

a sending unit 53, configured to send a disturbed encryption model to a plurality of terminals, where the disturbed encryption model includes disturbed encryption parameter matrices corresponding to the plurality of network layers;

a receiving unit 54 configured to receive confusion gradient terms respectively corresponding to the plurality of network layers from a first terminal of any of the plurality of terminals, where the confusion gradient terms are obtained by superimposing a second noise on a first noise gradient term, where the first noise gradient term is obtained by processing a first sample set local to the first terminal using the perturbation encryption model, and a combination result of the second noise and the random perturbation matrix of the corresponding network layer satisfies a gaussian distribution;

a disturbance recovery unit 55, configured to perform recovery processing on the confusion gradient items of the corresponding network layers by using the random disturbance matrices corresponding to the plurality of network layers, so as to obtain gradient recovery results of the plurality of network layers;

an aggregation updating unit 56 configured to aggregate the gradient restoration results corresponding to the plurality of terminals, and update the current parameter matrices of the plurality of network layers according to the aggregation result.

According to one embodiment, the disturbance matrix determination unit 51 comprises (not shown): a random vector determination module configured to determine, for each network layer of the neural network, a corresponding random vector, dimensions of which are the same as the number of neurons in the corresponding network layer; and the matrix determination module is configured to determine a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of an adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.

In one embodiment, the neural network comprises N actual network layers to be trained, and N-1 transition network layers inserted between adjacent actual network layers, wherein the transition network layers have fixed identity matrixes as parameter matrixes thereof; in such a case, the random vector determination module is specifically configured to: determining a first random vector aiming at each middle layer in an actual network layer; determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.

Further, in an example, each vector element in the first random vector conforms to a ternary decomposition data distribution of a gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer; in such a case, the matrix determination module is specifically configured to: and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.

In one embodiment, the random vector determination module is further configured to: determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution; and, the matrix determination module is further configured to: and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.

In one embodiment, the random vector determination module is further configured to determine, for the last transition network layer, a last second random vector, wherein reciprocals of respective vector elements conform to a binary decomposition data distribution of a gaussian distribution; and, the matrix determination module is further configured to: and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.

According to one embodiment, the perturbation encryption unit 52 is specifically configured to: for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer; and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.

According to an embodiment of another aspect, an apparatus for jointly training a business model based on privacy protection is also provided, and the apparatus is deployed in a first terminal, and the first terminal may be any type of terminal computing device. Fig. 6 shows a schematic diagram of a training apparatus deployed in a terminal according to one embodiment. As shown in fig. 6, the training apparatus 600 includes:

a receiving unit 61, configured to receive a perturbation encryption model from a server, where the perturbation encryption model includes perturbation encryption parameter matrices corresponding to a plurality of network layers in the neural network, and the perturbation encryption parameter matrices are obtained by performing perturbation processing on current parameter matrices of the network layers by using random perturbation matrices of the corresponding network layers;

a gradient obtaining unit 62, configured to process a first sample set local to the first terminal by using the perturbed encryption model, to obtain a first noise gradient term for each of the multiple network layers;

a noise adding unit 63 configured to superimpose a second noise on the first noise gradient term to obtain a confusion gradient term for each network layer; wherein a combination result of the second noise and the random disturbance matrix of the corresponding network layer satisfies a Gaussian distribution;

a sending unit 64, configured to send the confusion gradient items to the server, so that the server recovers the confusion gradient items of the corresponding network layers by using the random perturbation matrix, and aggregates the recovery gradients corresponding to the multiple terminals, thereby updating the current parameter matrices of the multiple network layers.

According to one embodiment, the gradient acquisition unit 62 is specifically configured to: inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer; and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.

In one embodiment, the noise adding unit 63 is specifically configured to: determining a noise matrix corresponding to each network layer; and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.

Further, the determining, by the noise adding unit 63, the noise matrix corresponding to each network layer may include: for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution; for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.

Through above device, can utilize the difference privacy of the disturbance encryption of model and gradient noise to handle, protect model parameter and gradient data not to let out the secret, and then ensure privacy data safety.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a server, comprising:

2. The method of claim 1, wherein determining, for a plurality of network layers in the neural network, corresponding random perturbation matrices comprises:

for each network layer of the neural network, determining a corresponding random vector, wherein the dimensionality of the random vector is the same as the number of neurons in the corresponding network layer;

and determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of the adjacent network layer, wherein the first network layer is an intermediate layer in the neural network.

3. The method of claim 2, wherein the neural network comprises N actual network layers to be trained, and N-1 transition network layers interposed between adjacent actual network layers, the transition network layers having a fixed identity matrix as their parameter matrices;

for each network layer of the neural network, determining a corresponding random vector, comprising:

determining a first random vector aiming at each middle layer in an actual network layer;

determining a second random vector for each transition network layer; wherein vector elements in the first random vector and the second random vector have different data distributions.

4. The method of claim 3, wherein each vector element in the first random vector conforms to a ternary decomposition data distribution of a Gaussian distribution; the reciprocal of each vector element in the second random vector accords with the ternary decomposition data distribution of Gaussian distribution; the first network layer is an intermediate layer in an actual network layer;

determining a random disturbance matrix corresponding to a first network layer according to the random vector of the first network layer and the random vector of an adjacent network layer, wherein the random disturbance matrix comprises:

and respectively combining each vector element in the first random vector corresponding to the first network layer with the reciprocal of each vector element in the second random vector corresponding to the previous transition network layer of the first network layer, and taking the combined result as each matrix element in the random disturbance matrix corresponding to the first network layer.

5. The method of claim 3, wherein determining, for each network layer of the neural network, a corresponding random vector, further comprises:

determining a third random vector aiming at an input layer in an actual network layer, wherein each vector element accords with the binary decomposition data distribution of Gaussian distribution;

determining, for a plurality of network layers in the neural network, corresponding random perturbation matrices, further comprising:

and taking the elements in the third random vector corresponding to the input layer as matrix elements to obtain a random disturbance matrix corresponding to the input layer.

6. The method of claim 3, wherein determining a second random vector for each transition network layer comprises: determining a last second random vector aiming at a last transition network layer, wherein the reciprocal of each vector element accords with the binary decomposition data distribution of Gaussian distribution;

and aiming at an output layer in the actual network layer, taking the reciprocal of each element in the last second random vector as a matrix element to obtain a random disturbance matrix corresponding to the output layer.

7. The method of claim 3, wherein the plurality of network layers are the N actual network layers to be trained.

8.The method according to claim 7, wherein the perturbing the current parameter matrix of the corresponding network layer by using the random perturbation matrix to obtain the perturbed encryption parameter matrix of the network layer comprises:

for each network layer except the output layer in the N actual network layers, performing corresponding position element combination on the random disturbance matrix corresponding to the network layer and the current parameter matrix to obtain a disturbance encryption parameter matrix of the network layer;

and for the output layer, performing corresponding position element combination on the random disturbance matrix corresponding to the output layer and the current parameter matrix of the output layer, and superposing an additional disturbance matrix aiming at the output layer to obtain a disturbance encryption parameter matrix of the output layer.

9. The method of claim 1, wherein the business model is used to predict business objects, the business objects comprising one of: user, merchant, transaction, image, text, audio.

10. A method for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the method being performed by a first terminal and comprising:

11. The method of claim 10, wherein the neural network comprises N actual network layers to be trained, and N-1 transition network layers interposed between adjacent actual network layers, the transition network layers having a fixed identity matrix as their parameter matrices;

the plurality of network layers are the N actual network layers to be trained.

12. The method of claim 10, wherein processing a first set of samples local to the first terminal using the perturbed cipher model to obtain a first noise gradient term for each of the plurality of network layers comprises:

inputting the characteristic data of each sample in the first sample set into the disturbance encryption model to obtain disturbance output of each network layer;

and obtaining the first noise gradient term according to the disturbance output of each network layer, the label data of each sample and a preset loss function.

13. The method of claim 10, wherein superimposing a second noise on the first noise gradient term comprises:

determining a noise matrix corresponding to each network layer;

and multiplying the noise matrix by preset noise amplitude and variance to serve as second noise corresponding to each network layer, and superposing the second noise on the first noise gradient item, wherein the noise amplitude is not less than the norm of the gradient.

14. The method of claim 13, wherein determining the noise matrix for each network layer comprises:

for an intermediate layer in the plurality of network layers, determining a first noise matrix, wherein each matrix element satisfies a ternary decomposition data distribution of a Gaussian distribution;

for an input layer and an output layer of the plurality of network layers, a second noise matrix is determined, wherein each matrix element satisfies a binary decomposition data distribution of a Gaussian distribution.

15. An apparatus for jointly training a business model based on privacy protection, wherein the business model is implemented by a neural network, and the apparatus is deployed in a server, and comprises:

the disturbance recovery unit is configured to perform recovery processing on the confusion gradient items of the corresponding network layers by using the random disturbance matrixes corresponding to the network layers to obtain gradient recovery results of the network layers;

16. An apparatus for jointly training a business model based on privacy protection, the business model being implemented by a neural network, the apparatus being deployed in a first terminal, comprising:

17. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-14.

18. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-14.