CN117196012A

CN117196012A - Differential privacy-based personalized federal learning identification method and system

Info

Publication number: CN117196012A
Application number: CN202311150942.3A
Authority: CN
Inventors: 瞿治国; 汤洋
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-12-08

Abstract

The invention discloses a personalized federal learning identification method and a system based on differential privacy, wherein the method is executed at a client and comprises the following steps: acquiring initialization model parameters, and loading the initialization model parameters into a pre-built local model; carrying out noise processing on the local model loaded with the parameters by adopting a personalized differential privacy algorithm based on privacy budget, and sending the shared layer model parameters of the local model subjected to the noise processing to a server; loading the aggregated shared layer model parameters into a shared layer of the local model after noise processing, and then locally fine-tuning to obtain a personalized model; and (3) training the personalized model repeatedly until the global iteration times are reached, and obtaining the personalized federal learning test model based on differential privacy. The client can process the local model according to the privacy budget selected by the client, so that the client adapts to different privacy requirements of different users, and simultaneously combines with an Adam algorithm, thereby reducing the influence of added noise on the model precision.

Description

Differential privacy-based personalized federal learning identification method and system

Technical Field

The invention relates to the technical field of information security, in particular to a personalized federal learning identification method and system based on differential privacy.

Background

With the rapid development of big data driven artificial intelligence, various intelligent driving applications improve the safety and efficiency of vehicle driving through real-time inference of deep convolutional networks. However, the information sharing of the intelligent network-connected automobile brings potential safety privacy hazards. Therefore, it is important to limit central access, and federal learning has been developed to address this problem.

Federal learning allows multiple parties to learn a shared predictive model through the use of parameter collaboration of a local training model while preserving the original training data locally (e.g., on an in-vehicle terminal). Federal learning can thus provide effective privacy protection in the collaboration of a large number of participants, iteratively training a particular machine learning model in a distributed computing manner. In federal learning, the original data of the user does not need to be uploaded to a central aggregation server, and the data security of each participant is ensured while sharing the local training model of each participant client. Since the data is distributed among different terminals, the data for each terminal is typically not independently and co-distributed. The data may be presented in a non-independent and co-distributed form due to personal habits. Eventually, the model optimization direction of each vehicle-mounted terminal is inconsistent and the generalization capability is poor. Meanwhile, federal learning has some security problems, and privacy information of users can still be revealed.

In view of the above problems, various solutions are proposed in the prior art, such as federal learning based on homomorphic encryption and federal learning based on secure multiparty computation, which are however costly to compute and communicate. One possible approach to this challenge is through federal learning algorithms based on differential privacy. Has obvious lightweight advantages compared with other methods. Currently, federal learning work based on differential privacy techniques mainly includes two categories: (1) Before uploading parameters, the user uses local differential privacy to noise model parameters uploaded by the user; (2) And utilizing the centralized differential privacy to noise the convergence gradient of the central aggregation server. The differential privacy theory is that random Gaussian noise is added to model parameters of a vehicle-mounted terminal deep convolutional network so as to provide privacy confidentiality for traffic sign recognition.

However, in the practical application process, the privacy requirements of users may be different according to professions and regions. The two types of federal learning efforts described above are based on an evenly distributed privacy budget, which is impractical due to legal, national or professional background differences. Furthermore, the unified privacy budget level means that some customers waste a large amount of privacy budget, which tends to negatively impact the accuracy of the model.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a personalized federal learning identification method and system based on differential privacy, which can meet different privacy requirements of different users and reduce the influence of added noise on the accuracy of a model.

The invention provides the following technical scheme:

in a first aspect, a personalized federal learning identification method based on differential privacy is provided, the method being executed at a client, comprising:

acquiring initialization model parameters from a server, and loading the initialization model parameters into a pre-built local model to obtain a local model with loading parameters;

based on privacy budget, adopting a personalized differential privacy algorithm to carry out noise processing on the local model with the loading parameters, and sending the shared layer model parameters of the local model after noise processing to a server;

loading aggregated shared layer model parameters from a server into a shared layer of a local model after noise processing, freezing the shared layer, and fine-tuning a full-connection layer to obtain a personalized model;

and (3) training the personalized model repeatedly until the global iteration times are reached, and obtaining the personalized federal learning test model based on differential privacy.

Further, a dynamic convolution layer is used in building the local model, and the dynamic convolution layer comprises a plurality of parallel convolution kernelsThe convolution kernel passes the input dependent attention pi _e (x) Dynamic aggregation of +.for each individual input x>And uses the same attention to focus on deviation +.>Wherein 0 is less than or equal to pi _e (x)≤1，/>E is the number of parallel convolution kernels.

Further, after the initialized model parameters are loaded into the pre-built local model, super parameters and a minimized loss function are set on the local model loaded with the parameters.

Further, the minimization loss function is:

wherein,model parameters uploaded for participating client k, < ->For participating client k through t local iterations of the shared layer parameters,/>To participate in full connection layer parameters for client k through t local iterations,is the model parameter after polymerization, n _k For the number of datasets held by client k, < +.>Is the model parameter of the shared layer after the t-th iteration aggregation, mu is the parameter of the regularization term, x _i For a sample in a dataset, l _i Is polymerized byModel parameters w sample x in data set _i Predicted loss.

Further, the personalized differential privacy algorithm includes:

initializing first-order moment estimation and second-order moment estimation, and judging whether the local iteration times reach the expected set times or not based on the selected privacy budget;

when the local iteration times do not reach the expected set times, randomly selecting samples, calculating gradients for each sample, and performing gradient cutting; adding Gaussian noise according to privacy budget selected by a user; then sequentially updating the partial first moment estimation, updating the partial second moment estimation, correcting the deviation of the first moment and correcting the deviation of the second moment; finally, performing calculation update and application update to complete a local iteration;

and when the local iteration times reach the expected set times, directly outputting the local model after noise processing.

Further, if the maximum local privacy budget is greater than the global differential privacy budget, the aggregated and noise-processed shared layer model parameters from the server are loaded into the noise-processed shared layer of the local model.

Further, the formula for carrying out noise processing on the aggregated shared layer model parameters is as follows;

wherein sigma is a random noise parameter, ε _g Is global differential privacy budget, σ _g Is epsilon _g Differential privacy budgeting noise ε to be added _k Is the differential privacy budget, σ, chosen by client k _k Is epsilon _k The noise to be added by the differential privacy budget, C is the threshold for gradient clipping, K is the number of participating clients, m is the minimum number of local data sets, C is a constant,delta is the relaxation term, set to 10 ^-5 Indicating that only 10 can be tolerated ^-5 Is against strict differential privacy.

Further, the fine tuning of the full-connection layer includes training of the full-connection layer using the preprocessed image dataset.

Further, the preprocessing method of the image dataset comprises the following steps:

scaling and cutting the picture data to obtain images with the same size;

carrying out standardization processing on images with the same size to ensure that each element of the images is in the range of [ -1,1], wherein the formula of the standardization processing is as follows:

wherein channel is the channel number of the image matrix, output channel is the output image matrix, input channel is the input image matrix, mean channel is the average value of the image matrix, and std channel is the standard deviation of the image matrix.

In a second aspect, there is provided a personalized federal learning identification system based on differential privacy, comprising:

the acquisition and data processing module is used for acquiring the initialization model parameters from the server, and loading the initialization model parameters into the pre-built local model to obtain a local model with loading parameters;

the personalized differential privacy module is used for carrying out noise processing on the local model loaded with the parameters by adopting a personalized differential privacy algorithm based on privacy budget and sending the shared layer model parameters of the local model subjected to the noise processing to the server;

the local fine tuning module is used for loading the aggregated shared layer model parameters from the server into the shared layer of the local model after noise processing, freezing the shared layer, and fine tuning the full-connection layer at the same time to obtain a personalized model;

and the iterative training module is used for repeatedly training the personalized model until the global iteration times are reached, and obtaining the personalized federal learning test model based on the differential privacy.

Compared with the prior art, the invention has the beneficial effects that:

(1) The client in the invention can add noise to the local model by adopting the personalized differential privacy algorithm according to the privacy budget selected by the client, so that the client can adapt to different privacy requirements of different users, and the model can pertinently protect the privacy of each user.

(2) The invention divides the local model into a shared layer and a non-shared layer (fully connected output layer), the shared layer executes a general federal learning algorithm, and parameters of the non-shared layer can not be shared (weights are locally reserved), so that the model can become personalized, meanwhile, the differential privacy and the Adam optimization algorithm are combined, the influence of added noise on the model precision is reduced by utilizing the index weighted average idea of the Adam algorithm, and the effects of protecting the privacy and reducing the influence of the noise on the model precision are achieved.

(3) The invention uses dynamic convolution layer when constructing the local model, the dynamic convolution layer is the convolution kernel of the convolution layer and the attention, and the convolution kernels have larger representation capability because the convolution kernels are gathered in a nonlinear mode through the attention; it does not increase the depth nor the width of the network, but rather increases the model's ability by focusing multiple convolution kernels, enabling lower performing terminals.

(4) According to the method and the device, differential privacy is added to the shared layer parameters uploaded to the server, so that privacy protection can be further improved; after the server aggregates the shared layer parameters and adds noise, the client freezes the shared layer parameters and then performs data training on the non-shared layer, so that the non-shared layer model parameters and the shared layer model parameters can be more adapted and matched, and meanwhile, the purpose of personalized learning can be achieved.

Drawings

FIG. 1 is a flow diagram of a personalized federal learning identification method based on differential privacy in an embodiment of the present invention;

FIG. 2 is a block diagram of a dynamic convolutional layer (DConv) in an embodiment of the present invention;

FIG. 3 is a diagram of the architecture of a local model in an embodiment of the invention;

fig. 4 is a flow chart of a personalized differential privacy algorithm in an embodiment of the invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Example 1

The embodiment provides a personalized federal learning identification method based on differential privacy, which is executed at a client and comprises the following steps:

Example 2

As shown in fig. 1, the embodiment provides a personalized federal learning identification method based on differential privacy, which includes the following steps:

step 1: and the client receives the initialization model parameters from the server, and builds a local model based on the initialization model parameters.

In particular, the initialization model parameters received by the client from the server are initialized with pre-training parameters of a model that is centrally trained on the same task with different subsets of data without personalization.

Step 1.1: as shown in fig. 2-3, when building a local model based on initializing model parameter loading, the dynamic convolution layer used does not use one convolution kernel, but uses a set of E parallel convolution kernels Instead of using a convolution kernel per layer, these convolution kernels pass through the input dependent attention pi _e (x) Dynamic aggregation for each individual input xAnd uses the same attention to focus on deviation +.>Wherein 0 is less than or equal to pi _e (x)≤1，

Dynamic convolution follows the classical design in convolutional neural networks, we use batch normalization and activation functions (e.g., reLU) to build a dynamic convolutional layer after the aggregate convolution; the dynamic convolution layer is the convolution kernel of the convolution layer plus attention and has a greater representation capacity because the convolution kernels are concentrated by attention in a nonlinear manner. It does not increase the depth nor the width of the network, but rather increases the model's ability by focusing multiple convolution kernels, providing a better tradeoff between network performance and computational burden. In addition, the dynamic convolutional network model based on the dynamic convolutional layer can avoid the influence of other functions of the terminal under the condition of not increasing the layer number of the convolutional neural network, thereby being suitable for terminals with lower performance and avoiding consuming a large amount of performance and time of the terminal.

Step 1.2: after the initialized model parameters are loaded into the pre-built local model, super parameters and a minimized loss function are set on the local model loaded with the parameters.

In this embodiment, when the super parameters are set, the local batch size is set to 40, the local iteration number is 5, the global iteration number is 10, the number of clients is 80, the client sampling ratio is 0.5, the global privacy budget is 50, the looseness of the differential privacy is 1/6000, the gradient clipping threshold is 1, the optimizer selects the SGD algorithm, and the learning rate is set to 0.005.

By setting the minimized loss function, the approach degree of the structure output by the model and the correct output value can be judged, and meanwhile, the local update is not too far away from the initial global model, so that the influence of non-independent identical distribution is reduced on the premise of tolerating the system isomerism. The minimization loss function is:

wherein,model parameters uploaded for participating client k, < ->For participating client k through t local iterations of the shared layer parameters,/>To participate in full connection layer parameters for client k through t local iterations,is the model parameter after polymerization, n _k For the number of datasets held by client k, < +.>Is the model parameter of the shared layer after the t-th iteration aggregation, mu is the parameter of the regularization term, x _i For a sample in a dataset, l _i Sample x in the dataset for aggregated model parameters w _i Predicted loss.

In the formula (1)The cross entropy loss function is mainly used for judging the approach degree of the result output by the model and the correct output value; />The local update is not far away from the initial global model, and the influence of non-independent co-distribution is reduced on the premise of tolerating the system heterogeneity.

Step 2: and the client performs noise processing on the local model loaded with the parameters by adopting a personalized differential privacy algorithm based on the privacy budget selected by the user, and sends the shared layer model parameters of the local model after the noise processing to the server.

Step 2.1: the client can set privacy budget in a personalized way according to own privacy requirements (different professions and regions), so that the model can pertinently protect the privacy of each user.

As shown in fig. 4, the personalized differential privacy algorithm includes the steps of:

(1) Initializing a first moment estimation and a second moment estimation;

(2) Client k selects a privacy budget;

(3) Judging whether the local iteration times reach the expected set times or not;

(4) Randomly selecting samples when the local iteration times do not reach the expected set times;

(5) Calculating a gradient for each sample;

(6) Performing gradient cutting;

(7) Adding Gaussian noise according to privacy budget selected by a user;

(8) Updating the partial first moment estimation;

(9) Updating the bias moment estimation;

(10) Correcting the deviation of the first moment;

(11) Correcting the deviation of the second moment;

(12) Calculating and updating;

(13) Updating the application to finish one local iteration;

(14) And continuing the local iteration, and directly outputting the local model after noise processing when the local iteration times reach the expected set times.

Definition of personalized differential privacy: for epsilon _k Determining the privacy budget level selected by the user, and determining the addition of local noise according to the value of the privacy budget level; for client k (i.e., user k), if the random algorithm M outputs the same result t under any two records t and t', i.e., satisfies equation (2), then it is said that M satisfies epsilon-PDP.

Privacy budget ε for client k _k Is selected from privacy budget levels, wherein the privacy budget levels categorize policies ε (ε) _low ,ε _mid ,ε _high ) Divided into three classes.

The noise mechanism is a main technology for realizing differential privacy, and various noise mechanisms are available at present, such as a Laplacian mechanism, a Gaussian mechanism and the like; because the curve of the gaussian mechanism appears more "flat". When a gaussian mechanism is applied, we are more likely to get a differential privacy output result far from the true value, so privacy preserving effects are generally better than the laplace mechanism.

Gaussian mechanism: assume that there is a function f: D-R ^D The sensitivity is Δf. When N to N (0, cΔf/epsilon), the random algorithm m=f (D) +n provides privacy protection for (epsilon, delta) -DP. Wherein the parameter isProbability density function of gaussian distribution formula (3):

wherein σ represents the standard deviation of the gaussian distribution, and μ represents the gaussian distributionExpected value, sensitivityf is a query function; the Gaussian mechanism adopts l ₂ Sensitivity so as to be suitable for the training process of federal learning.

Step 2.2: client k will noise-treat the shared layer model parameters of the local modelAnd sending the average result to a server for averaging. Full connectivity layer parameter of final client k +.>Is maintained at the clients as is the local data for each client. Thus, only the parameters of the convolutional layer are averaged to w _g And then used to update each local client model.

Step 3, the server performs the model parameters of the sharing layer from the client kAnd carrying out aggregation and noise addition, and sending the aggregated and noise added shared layer model parameters to the client.

Step 3.1: server pair sharing layer model parametersThe polymerization is carried out using formula (4),

in equation (4), n is the total data amount of all client data sets, n _k The number of datasets held for client k; k is the number of participating clients; pk is the proportion of participating client k data sets to the total amount of data for the aggregate computation of the server.

Step 3.2: under the scenario of personalized federal learning privacy protection, noise added by a user is not uniform, and the differential privacy protection level of the federal learning process can be quantified by adding noise through a differential privacy algorithm, so that the privacy protection level of federal learning is controlled.

Specifically, if the maximum local privacy budget is greater than the global differential privacy budget threshold (high encryption level is not reached), loading aggregated and noise-processed shared layer model parameters from the server into the noise-processed shared layer of the local model; otherwise, aggregated shared layer model parameters w _g Directly sent to the client.

Specifically, a formula for performing noise processing on the aggregated shared layer model parameters is as follows;

Step 4: the client loads the aggregated (and noise processed) shared layer model parameters from the server into the shared layer of the noise processed local model, then freezes the shared layer, and simultaneously fine-tunes the full connection layer to obtain the personalized model.

Step 4.1: the shared layer model parameters w from the server _g The shared layer loaded into the local model.

Step 4.2: the client freezes the shared layer parameter w from the server _g And optimize each clientThe learner learning rate α is reduced by a factor η to avoid overshoot, and the reduction in the learning rate ensures that the local layer slowly "reconnects" to the previously separated global average convolutional layer.

Step 4.3: the trimming full-connection layer is a local trimming. Local fine-tuning refers to training a portion of the local model (non-shared layer) with the dataset. The sharing layer after aggregation and noise processing by the server can be matched and matched with the original non-sharing layer better, so that local data can be classified better; while local fine-tuning ensures that the model becomes personalized, because in this step the fully connected layer of each client only trains on locally available data.

Specifically, fine-tuning the fully connected layer includes training the fully connected layer using the preprocessed image dataset. The preprocessing method of the image dataset comprises the following steps:

(1) And scaling and cutting the picture data to obtain images with the same size.

In this embodiment, the picture data is from a picture with higher privacy degree such as traffic sign picture; the data set used is mainly from the german GTSRB data set (german traffic sign recognition reference), which is traffic sign detection data, and assists the driver in traffic sign recognition through pattern recognition technology.

In this embodiment, the client first scales the traffic sign picture width and height to 40×40, and clips the picture with the size of 40×40 from the center to the size of 32×32, so that all the picture sizes are the same set size.

(2) The images with the same size are standardized, the standardization can ensure that the network can well converge, and before the relative importance degree of each dimension is not clear, the standardization enables the distribution of each dimension of the input to be similar, so that the network training process is allowed to realize 'one-view' of each dimension (namely, the same learning rate, the same regularization term coefficient, the same weight initialization and the same activation function are set). Normalization here, as shown in formula (6), the image is normalized channel by channel (the mean becomes 0 and the standard deviation becomes 1) so that each element of the image is within the range of [ -1,1 ]. The formula of the normalization process is as follows:

Step 4.4: using the parameters obtainedThe new data is inferred.

Step 5: returning to the step 2, repeating training on the personalized model obtained in the step 4 until the global iteration times are reached, and outputting a personalized federal learning test model based on differential privacy.

Example 3

The embodiment provides a personalized federal learning identification system based on differential privacy, which comprises:

The personalized federal learning identification system based on the differential privacy provided by the embodiment of the invention can execute the personalized federal learning identification method based on the differential privacy provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A personalized federal learning identification method based on differential privacy, the method being performed at a client and comprising:

2. The differential privacy-based personalized federal learning identification method of claim 1, wherein a dynamic convolution layer is used in building the local model, the dynamic convolution layer comprising a plurality of parallel convolution kernelsThe convolution kernel passes the input dependent attention pi _e (x) Dynamic aggregation of +.for each individual input x>And uses the same attention to focus on deviation +.>Wherein->E is the number of parallel convolution kernels.

3. The differential privacy-based personalized federal learning identification method of claim 1, wherein after loading the initialization model parameters into the pre-built local model, the super parameters and the minimization of the loss function are set on the local model loaded with the parameters.

4. The differential privacy-based personalized federal learning identification method of claim 3, wherein the minimization of the loss function is:

wherein,model parameters uploaded for participating client k, < ->For participating client k through t local iterations of the shared layer parameters,/>To participate in full connection layer parameters for client k through t local iterations,is the model parameter after polymerization, n _k For the number of datasets held by client k, < +.>Is the model parameter of the shared layer after the t-th iteration aggregation, mu is the parameter of the regularization term, x _i For a sample in a dataset, l _i For the aggregated model parameters w for the samples in the datasetThe X is _i Predicted loss.

5. The personalized federal learning identification method based on differential privacy of claim 1, wherein the personalized differential privacy algorithm comprises:

6. The differential privacy-based personalized federal learning identification method of claim 1, wherein if the maximum local privacy budget is greater than the global differential privacy budget, the aggregated and noise-processed shared layer model parameters from the server are loaded into the noise-processed shared layer of the local model.

7. The differential privacy-based personalized federal learning identification method of claim 6, wherein the formula for noise processing of the aggregated shared layer model parameters is;

wherein sigma is a random noise parameter, ε _g Is global differential privacy budget, σ _g Is epsilon _g Differential privacy budgeting noise ε to be added _k Is the difference of client k choicesFractional privacy budget, sigma _k Is epsilon _k The noise to be added by the differential privacy budget, C is the threshold for gradient clipping, K is the number of participating clients, m is the minimum number of local data sets, C is a constant,delta is the relaxation term, set to 10 ^-5 。

8. The differential privacy-based personalized federal learning identification method of claim 1, wherein fine-tuning the fully connected layer comprises training the fully connected layer with a preprocessed image dataset.

9. The differential privacy-based personalized federal learning identification method of claim 8, wherein the image dataset preprocessing method comprises:

scaling and cutting the picture data to obtain images with the same size;

10. A personalized federal learning identification system based on differential privacy, comprising: