CN115660050A

CN115660050A - Robust federated learning method with efficient privacy protection

Info

Publication number: CN115660050A
Application number: CN202211382903.1A
Authority: CN
Inventors: 周培钊; 张宝磊; 刘哲理
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-01-31

Abstract

The invention belongs to research in the field of federal learning robustness and high efficiency, and particularly relates to a robust federal learning method for high-efficiency privacy protection. The method comprises the following steps that 1, two servers cooperatively select n clients to participate in a federal learning process, and respectively broadcast own global models to the clients; step 2, the client initializes the local model by using the received global model, trains the local model by using a local data set, and sends the trained local model to two servers; step 3, the two servers construct a shared matrix by using the received local model and project the shared matrix to the dimension-reduced shared matrix; and 4, cooperatively executing a Byzantine elastic aggregation algorithm on the shared matrix by the two servers to obtain a new global model. The invention reduces the number of private multiplications in the aggregation algorithm, continues the advantage of protecting data privacy in the Federal learning, and effectively improves the efficiency of the Federal learning algorithm.

Description

Robust federated learning method with efficient privacy protection

Technical Field

The invention belongs to research in the field of federal learning robustness and high efficiency, and particularly relates to a robust federal learning method for high-efficiency privacy protection.

Background

As is well known, machine learning requires a large amount of data to train a model. However, as more and more countries emphasize the protection of personal privacy, the advent of regulations such as General Data protection regulations (General Data protection regulations) makes it impossible for sensitive private Data to be collected and accessed directly. To address this problem, federal Learning (fed Learning) provides a collaborative machine Learning framework that allows multiple users or organizations (clients) to collaboratively train a shared global model without sharing their own data. In federal learning, there is typically a main server coordinating multiple clients through multiple iterations to solve an optimization problem. The master server may communicate with each client and obtain a new global model by aggregating the local models of the clients in an iterative process. The federated learning is particularly suitable for cooperative training of mass IoT and mobile phone end users.

Despite the advantages of federal learning, it is susceptible to byzantine attacks and privacy inference attacks. The Byzantine attack mainly occurs at the client, and the Byzantine client injects a malicious local model in the aggregation process, so that the performance of a new global model is reduced. Privacy inference attacks mainly occur at the server, which can infer the attributes of the client data and even reconstruct the client data through the local model of the client. There are few defensive measures to simultaneously defend against byzantine attacks and privacy inference attacks. Recent studies have shown that only a single client with a byzantine failure, including computational failures, equipment failures, deviations in data samples and labels, can break linear aggregation rules such as FedAvg. But in the worst case, a malicious client may attempt to corrupt the global model. To address the above problems, machine learning communities have recently developed some Byzantine elastic aggregation rules, typically including Krum and Multi-Krum and media and Trimmed mean. The first local model selects a local model closest to other local models as a new global model based on Euclidean distance between the local models, and the second local model calculates median of all local model parameters and takes the median as a parameter corresponding to the new global model. In the aspect of preventing privacy inference attacks, homomorphic Encryption (Homomorphic Encryption) and Secure Multi-Party computing (Secure Multi-Party computing) protocols are widely applied, wherein the Homomorphic Encryption protocol has high Computation overhead and limited supported operation types, and the Secure Multi-Party computing protocol has low Computation overhead and wide supported operation types but has high communication overhead among the participants. Some existing defense schemes utilize secret sharing techniques to implement privacy-preserving distance computation and use distance-based byzantine elastic aggregation rules (e.g., multi-Krum) to obtain new global models, however, these schemes are computationally expensive and the disclosure of pairwise distances may pose new privacy threats. Therefore, it is a very urgent need to develop new and effective federal learning scheme with better safety.

Disclosure of Invention

To this end, it is desirable to provide a robust federated learning method that is efficient in privacy protection, and the object of the present invention is, on the one hand, to prevent the server from inferring some of the attributes of the private data from the local model, and even reconstructing the original data. On the other hand, under the protection of the MPC protocol, the number of privacy protection multiplications of the Federal learning algorithm is reduced, the overall algorithm overhead is reduced, the algorithm efficiency is improved, and the Federal learning algorithm obtains better expandability on the premise of keeping robustness.

In order to reduce the number of privacy preserving multiplications, it is innovatively proposed to perform a random projection operation on the secret shares of all local models before calculating the pairwise distances, randomly generate a projection matrix with the element 1 or-1, directly multiply the shared matrix of the local models with the projection matrix, and project the local models in the high-dimensional space into the low-dimensional space. The projection operation only introduces addition and subtraction with free overhead in the MPC protocol, without any communication overhead. Thus the number of privacy preserving multiplications required to compute the pairwise distances in a low dimensional space will be greatly reduced.

In order to realize the purpose, the invention adopts the following technical scheme:

a robust federated learning method with efficient privacy protection comprises the following steps,

step 1, two servers cooperatively select n clients to participate in the federal learning process, and respectively broadcast own global models to the clients;

step 2, the client initializes the local model by using the received global model, trains the local model by using a local data set, and sends the trained local model to the two servers;

step 3, the two servers construct a shared matrix by using the received local model and project the shared matrix to the dimension-reduced shared matrix;

and 4, cooperatively executing a Byzantine elastic aggregation algorithm on the shared matrix by the two servers to obtain a new global model.

In step 2, floating point numbers in the trained local model are converted into integers, and then the local model is shared by an MPC protocol and sent to the two servers.

In the further optimization of the technical scheme, in the step 2, the client i enables the local model L to be used _i Initialize the received global model G and train L with the local data set _i The client then generates L using the arithmetically shared operation function in the ABY framework _i To a secret share

Wherein

Represents the jth share in the two-party arithmetic secret sharing of the local model L and will

Is sent to the server 1 and is sent to the server,

to the server 2.

In a further optimization of the present technical solution, in step 3, each server collects all secret shares sent by all clients, and constructs a sharing matrix<M> ^A Wherein the ith behavior of the shared matrix

Then, both servers will use the same random projection matrix<M> ^A Projection to reduced-dimension local model shared matrix<M′> ^A In (1).

In a further optimization of the present technical solution, a coefficient is further added before the product shared by the projection matrix and the local model in step 3

Wherein

ε is the expected error of distance maintenance, μ is the projection success probability, and n represents the number of clients.

In a further optimization of the technical scheme, before the construction of the byzantine elastic aggregation algorithm in the step 4, three basic modules are firstly constructed: 1)<D> ^A ←DistanceToOther(<M> ^A ) The function being shared with the matrix<M> ^A For input, output of a shared matrix of pairwise squared distances<D> ^A Wherein, in the step (A),

and

are all shown as<M> ^A The square distance between the ith row and the jth row of (2) is shared, and three operations of addition ADD, subtraction SUB and multiplication MUL are required to be carried out in ABY; 2)<D′> ^A ←SortValue(<D> ^A Axis) that will share a matrix<D> ^A And the integer axis ∈ 0,1 as input, and outputs a sorted shared matrix<D′> ^A If axis is 0, the functions are paired in ascending order<D> ^A Is sorted, otherwise it is paired in ascending order<D> ^A In the function, privacy protection bitonic ordering is realizedAn algorithm, which requires two operations of GT judgment and SWAP exchange; 3) ^A ←SortIndexByValue(<S> ^A ) The function is based on a shared vector<S> ^A For input, output indexed shared vector ^A In ascending order to<S> ^A And sequencing, wherein the function comprises a SortValue function and two operations of judging GT and exchanging SWAP in ABY.

In the step 4, two servers exchange each other ^A After the index vector I is recovered, the sum of the original dimension local models corresponding to the first n-beta-1 indexes is calculated locally

n is the number of clients, beta is the number of Byzantine clients, and a new global model G can be obtained only by collaboratively recovering G by the two servers and dividing the G by n-beta-1.

Different from the prior art, the technical scheme has the following beneficial effects: the scheme is based on two non-collusion servers, the overhead is extremely low, and the distance between any two local models is not disclosed. The method and the device have the advantages that the random projection operation is conducted on the shared matrix of the local models before the pairwise distance is calculated, the local models in the high-dimensional space are projected into the low-dimensional space, the pairwise distance between the local models can be kept in the low-dimensional space with small errors, and accordingly the number of private multiplications is greatly reduced. The idea can be applied to any privacy-preserving implementation of distance-based byzantine elastic aggregation rules, and hardly influences the robustness of the aggregation algorithm. Then, in order to avoid potential privacy threats brought by exposure of paired distances, a Byzantine elastic aggregation algorithm Multi-Krum for privacy protection is realized based on an ABY framework, only indexes and a new global model of a selected client are exposed, and the advantage of data privacy protection in federal learning is continued.

Drawings

FIG. 1 is a flow chart of client and server interaction;

FIG. 2 is a diagram of the algorithm steps for the projection scheme "RandomProjection";

FIG. 3 is a comparison graph of model prediction accuracy for different numbers of Byzantine clients under Gaussian attack using respectively MNIST, fashon-MNIST, CIFAR-10 datasets for Multi-Krum and MKRP;

FIG. 4 is a graph comparing model prediction accuracy for different numbers of Byzantine clients under a label flipping attach using the MNIST, fashion-MNIST, CIFAR-10 datasets for Multi-Krum and MKRP, respectively;

FIG. 5 is a comparison graph of training times for Multi-Krum and MKRP using CIFAR-10 data sets.

Detailed Description

In order to explain technical contents, structural features, objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in combination with the embodiments.

The invention provides a robust federated learning method for efficient privacy protection, which adopts the arrangement of two servers and n clients. Firstly, two servers jointly select n clients participating in the federal learning, and respectively distribute the global model of each client to each client participating in the federal learning. The client initializes the local model by the global model, trains the local model by the local data set, generates a pair of arithmetic secret sharing of the local model, and respectively sends corresponding secret sharing shares to the two servers. The server builds a shared matrix with all the received secret shares and projects it into the reduced-dimension shared matrix using the same projection matrix. The projection operation is performed locally at the server, and no communication overhead is generated. Finally, under the protection of the MPC protocol, the two servers cooperatively perform byzantine elastic aggregation and output a new global model.

The robust federated learning method for efficient privacy protection of the present invention comprises the following steps,

step 1, when each iteration of the method starts, two servers cooperatively select n clients to participate in the federal learning process, and respectively broadcast the global models of the two clients.

And 2, initializing a local model by using the received global model by the client, training the local model by using a local data set, converting floating point numbers in the local model into integers, sharing the local model by using an MPC protocol, and sending the shared local model to the two servers.

And 3, the two servers use the received secret sharing to construct a shared matrix and project the shared matrix to the dimension-reduced shared matrix.

Referring to fig. 1, a flowchart illustrating client and server interaction is shown. After the client side trains the local model by using the local data set, the local model is shared to the server in a secret mode. And the server constructs a shared matrix of the client local model, projects the shared matrix into a low-dimensional space, and then executes a Byzantine elastic aggregation algorithm to obtain a new global model.

The invention discloses a preferable embodiment, and the specific implementation details are as follows:

step 1, the two servers cooperatively select n clients and respectively broadcast the global model G of the two servers to the clients. This arrangement may prevent a single server from selecting colluding clients to help it infer private information of other clients. The client judges whether the two servers truthfully execute the aggregation protocol in the previous round by comparing the received two global models.

Step 2, the client i converts the local model L into a local model L _i Initialize the received global model G and train L with the local data set _i . The client then generates L using the arithmetically shared operation function in the ABY framework _i To a secret share

Wherein

The j-th share in the two-party Arithmetic secret sharing representing the local model L, A being an acronym for Arithmatic, will be

Is sent to the server 1 and is sent to the server,

to the server 2. Input the local model L _i Outputting a pair of arithmetic share shares

And

namely, it is

Where ArithmeticSharing (. Circle.) is the arithmetic sharing function in ABY.

It should be noted that the arithmetic shared operation function in ABY only supports integer input, so the client needs to first L _i Converting the floating point number in the sequence to an integer, and then comparing L _i And (5) performing arithmetic sharing.

Step 3, each server collects all secret shares sent by all clients and constructs a sharing matrix<M> ^A . Wherein the ith action of the shared matrix

Then, both servers will use the same random projection matrix<M> ^A Projection to reduced-dimension local model sharing matrix<M′> ^A In, i.e.<M′> ^A ←RandomProjection(<M> ^A ) Where RandomProjection (-) is a random projection function, this process does not generate communication overhead.

The random projection function RandomProjection is derived by theorem 1.

Theorem 1 assumes P to be a d-dimensional real number domain

Contains an arbitrary set of n elements, represented as a matrix M of n x d, where n, d>0。

Given ε, μ >0, let

For integer k>k ₀ Let S be a random matrix of d × k, S (i, j) = S _ij Where { s _ij Are independent random variables subject to the following probability distributions,

order to

Let f:

f mapping the ith row of M to the ith row of M'. For all u, v ∈ P, with at least 1-n ^-μ Has a probability of

(1-ε)‖u-v‖ ² ≤‖f(u)-f(v)‖ ² ≤(1+ε)‖u-v‖ ²

From theorem 1, the same penalty as in the Johnson-lindenstruss lemma can be obtained even if the projection matrix is chosen from a simple uniform distribution. Only the elements of the projection matrix of +1 and-1 are included, and only free addition and subtraction operations are needed when the projection matrix sum is calculated and shared by the local model. However, the product shared by the projection matrix and the local model is preceded by a coefficient

Wherein

Usually floating point numbers, ABY does not support the operation, so it will be straightforward

Replacement is M' = MS. The distance relationship between the local models in the low dimensional space depends only on the projection matrix,

the existence of (1) is to reduce errors in distances in the low-dimensional space and the original-dimensional space, and the distance-based byzantine elastic aggregation rule usually detects outliers from the distance relationship between local models, so the replacement operation herein does not reduce robustness. The algorithmic details of the projection scheme RandomProjection are shown in figure 2.

Both servers need to execute a projection algorithm. Each server first obtains an identical random projection matrix R using the same random seed, where the elements of R are in constant form. And then each server locally calculates the product of the R and the local model sharing matrix to obtain a low-dimensional local model sharing matrix. After random projection, the distance between the low-dimensional local models is directly calculated, and the distance between the original-dimensional local models is simulated by a smaller error, so that the number of MPC multiplications is greatly reduced. Thus, in implementing most MPC-based distance-and byzantine-elasticity-based aggregation rules, a stochastic projection scheme can be applied to reduce overhead.

Step 4, using MPC framework ABY, two servers are cooperatively paired<M′> ^A And<M> ^A the byzantine elastic aggregation algorithm, multi-Krum, which performs privacy protection, where two servers only know whether a certain client is selected to participate in the aggregation. Note book<x> ^B For x Boolean shares generated with ABY, B being Boolean, some basic operations on arithmetic sharing are first defined:

·ADD(<x> ^A ,<y> ^A ): returning x + y arithmetic sharing<x+y> ^A . Similarly, SUB: (<x> ^A ,<y> ^A ) Returning x-y arithmetic sharing<x-y> ^A ，MUL(<x> ^A ,<y> ^A ) Arithmetic sharing with return x y<x·y> ^A 。

·GT(<x> ^B ,<y> ^B ): if x>y, boolean sharing of Return 1<1> ^A Otherwise return a boolean sharing of 0<0> ^A 。

·SWAP(<x> ^B ,<y> ^B ,<s> ^B ): if a =1, return (<y> ^B ,<x> ^B ) Otherwise return to<x> ^B ,<y> ^B )。

The following functions are then implemented, serving as basic building blocks for the aggregation algorithm:

·<D> ^A ←DistanceToOther(<M> ^A ): the function shares the matrix<M> ^A Outputting a shared matrix of pairwise squared distances as inputs<D> ^A Wherein, in the process,

and

are all shown as<M> ^A And (3) the sharing of the squared distance between the ith and jth rows of (a). In ABY, three operations of addition ADD, subtraction SUB, and multiplication MUL are required.

·<D′> ^A ←SortValue(<D> ^A Axis): the function will share the matrix<D> ^A And the integer axis ∈ 0,1 as input, and outputs a sorted shared matrix<D′> ^A . If axis is 0, the functions are paired in ascending order<D> ^A Is sorted, otherwise it is paired in ascending order<D> ^A Is sorted for each column of (a). The function realizes a bitonic sequencing algorithm for privacy protection, and the bitonic sequencing algorithm needs two operations of GT judgment and SWAP exchange.

· ^A ←SortIndexByValue(<S> ^A ): the function shares the vector<S> ^A Outputting indexed shared vectors as inputs ^A In ascending order to<S> ^A And (6) sorting. The function includes the SortValue function, and both GT and SWAP operations in ABY.

Aggregating algorithms to a shared matrix of original local models<M> ^A Shared matrix of k-dimensional local model<M′> ^A And Byzantine client number beta as input, and outputting a new global model G, namely G ← RobustAggregation: (<M′> ^A ,<M> ^A ). The specific process is as follows: 1) Two servers and<M′> ^A the DistanceToother function is calculated for the inputs together, and the distance between every two k-dimensional local models, i.e., the distance between every two k-dimensional local models<D> ^A ←DistanceToOther(<M′> ^A ) (ii) a 2) Two servers and<D> ^A and integer 0 as input, and sorting the distances from each local model to all other local models in ascending order, i.e. by calculating the SortValue function<D′> ^A ←SortValue(<D> ^A ) (ii) a 3) The two servers locally calculate the sum of n-beta-1 distances from each local model to the other local models, i.e.

Wherein n is the number of clients and β is the number of byzantine clients; 4) Two servers and<S> ^A the SortIndexByValue function is calculated for the input together with the output and each sorted

Corresponding local model indices, i.e. ^A ←SortIndexByValue(<S> ^A ) (ii) a 5) Finally, the two servers exchange ^A The index vector I is restored and then the sum of the original dimensional local model corresponding to the first n-beta-1 indices is calculated locally, i.e.

Then, the two servers cooperate to recover G and divide it by n-beta-1 to obtain a new global model G. It can be seen that only the index vector I and the new global model G are exposed to the server throughout the privacy-preserving Multi-Krum algorithm cycle.

And step 5, through experimental verification, the scheme considers image data sets of three classification tasks needing high-dimensional models, including MNIST, fast-MNIST and CIFAR-10. Both MNIST and Fashion-MNIST consist of 70000 28X 28 grayscale images, and CIFAR-10 contains 60000 32X 32 color images. A class 10 classifier was trained on each of these three data sets. For MNIST and Fashion-MNIST, the same Deep Neural Network (DNN) is used, with an architecture of 2 convolutional layers and 2 fully-connected layers. For CIFAR-10, a lightweight ResNet-18 was used for classification. Unlike the original ResNet-18, the lightweight ResNet-18 modifies the size of the first convolutional layer to 32X 3.

To evaluate the effect of random projection on Multi-Krum, multi-Krum and Multi-Krum with random projection (MKRP) were evaluated on three data sets. Referring to fig. 3, the accuracy when testing different numbers of byzantine clients under Gaussian attack (Gaussian attack) can be seen as no effect of random projection on Multi-Krum. When beta is the same, the accuracy of MKRP on MNIST, fahion-MNIST and CIFAR-10 is respectively reduced by 0.2 percent, 0.2 percent and 0.6 percent at most than that of Multi-Krum, and can be almost ignored. The accuracy on MNIST, fashion-MNIST and CIFAR-10 decreased by 0.1% and 0.8% respectively as beta increased from 4 to 12,Multi-Krum, while the accuracy on MKRP decreased by 0.3%, 0.3% and 0.9% respectively. The reduction between MKRP and Multi-Krum is at most 0.3%, negligible. Thus, the effect of random projection on Multi-Krum does not change with β.

The accuracy of different Byzantine client numbers was tested under a label flipping attack (label flipping attack). The difference between the attack method and the gaussian attack method is that the byzantine partial model and the benign partial model follow different distributions. The difference in scores between the Byzantine local model and the benign local model depends on the model and the dataset, and therefore the effect of the random projection is different. Referring to FIG. 4, the accuracy of MKRP on MNIST, fahison-MNIST and CIFAR-10 decreased by up to 0.2%, 1% and 5% compared to Multi-Krum for the same number of Byzantine customers.

Using the CIFAR-10 data set, the training times for Multi-Krum and MKRP are shown in FIG. 5. It can be seen that under the effect of the stochastic projection strategy, the training time of MKRP is optimized by about 40 times compared to Multi-Krum, which indicates that the stochastic projection operation can effectively improve the efficiency of the federal learning algorithm.

The invention has the following advantages: 1. the number of private multiplications in the aggregation algorithm is reduced; 2. the projection operation hardly influences the robustness of the aggregation algorithm; 3. the Byzantine elastic aggregation algorithm Multi-Krum with privacy protection only exposes indexes and new global models of the selected clients, and the advantage of data privacy protection in federal learning is continued; 4. on the premise of ensuring the privacy and the robustness of the federated learning algorithm, the efficiency of the federated learning algorithm is effectively improved.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "include", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or terminal device including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such process, method, article, or terminal device. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" or "comprising 8230; \8230;" does not exclude additional elements from existing in a process, method, article, or terminal device that comprises the element. Further, in this document, "greater than," "less than," "more than," and the like are understood to not include the present numbers; the terms "above", "below", "within" and the like are to be understood as including the number.

Although the embodiments have been described, other variations and modifications of the embodiments may occur to those skilled in the art once they learn of the basic inventive concepts, so that the above description is only for the embodiments of the present invention, and is not intended to limit the scope of the invention, which is intended to be covered by the present invention.

Claims

1. A robust federated learning method with efficient privacy protection is characterized by comprising the following steps,

step 2, the client initializes the local model by using the received global model, trains the local model by using a local data set, and sends the trained local model to two servers;

2. The robust federated learning method for efficient privacy protection as defined in claim 1, wherein in step 2, floating point numbers in the trained local model are converted into integers, and then the local model is shared with MPC protocol and sent to two servers.

3. The robust federated learning method for efficient privacy protection as defined in claim 1, wherein the step 2 client i maps a local model L to _i Initialize the received global model G and train L with the local data set _i The client then generates L using the arithmetically shared operation function in the ABY framework _i To a secret share

Wherein

Is sent to the server 1 and is sent to the server,

to the server 2.

4. The robust federated learning method with efficient privacy protection as claimed in claim 3, wherein each server in step 3 collects all the secret shares sent by all clients and constructs a sharing matrix<M> ^A Wherein the ith behavior of the shared matrix

Then, both servers will use the same random projection matrix<M> ^A Projection to reduced-dimension local model sharing matrix<M′> ^A In (1).

5. The robust federated learning method for efficient privacy protection as defined in claim 3, wherein the product of the projection matrix and the local model in step 3 is preceded by a coefficient

Wherein

6. The robust federated learning method for efficient privacy protection as defined in claim 3, wherein the Byzantine elasticity aggregation algorithm of step 4 is constructed first with three basic modules: 1)<D> ^A ←DistanceToOther(<M> ^A ) The function being shared with the matrix<M> ^A For input, output of a shared matrix of pairwise squared distances<D> ^A Wherein, in the step (A),

and

are all shown as<M> ^A The sharing of the square distance between the ith row and the jth row of (b), three operations of addition ADD, subtraction SUB and multiplication MUL are required in ABY; 2)<D′> ^A ←SortValue(<D> ^A Axis) that will share a matrix<D> ^A And the integer axis ∈ 0,1 is used as input, and a sorted sharing matrix is output<D′> ^A If axis is 0, the functions are paired in ascending order<D> ^A Is sorted, otherwise it is paired in ascending order<D> ^A Each column of the function is sorted, a bitonic sorting algorithm for privacy protection is realized in the function, and the function needs two operations of judging GT and exchanging SWAP; 3) ^A ←SortIndexByValue(<S> ^A ) The function is based on a shared vector<S> ^A Outputting indexed shared vectors as inputs ^A In ascending order to<S> ^A Sorting is carried out, and the function comprises a SortValue function and two operations of judging GT and exchanging SWAP in ABY.

7. The robust federated learning method for efficient privacy protection as defined in claim 6, wherein in the Byzantine elastic aggregation algorithm of step 4, two servers exchange ^A After the index vector I is recovered, the sum of the original dimension local models corresponding to the first n-beta-1 indexes is calculated locally

n is the number of clients, beta is the number of Byzantine clients, and the two servers cooperatively recover G and divide G by n-beta-1 to obtainTo a new global model G.