CN115861705A

CN115861705A - Federal learning method for eliminating malicious clients

Info

Publication number: CN115861705A
Application number: CN202211638722.0A
Authority: CN
Inventors: 张剑飞; 周超然; 张婧; 杨宏伟; 冯欣; 杨佳东
Original assignee: Changchun University of Science and Technology
Current assignee: Changchun University of Science and Technology
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-03-28

Abstract

The invention discloses a federal learning method for eliminating malicious clients, and relates to the field of image classification. Selecting clients with a preset proportion from all the clients, sending a checking command, and sending the ResNet-101 classification model of the current turn to the unselected clients; the selected client side carries out data set distillation according to the ResNet-101 classification model and the local data set of the previous round, and the server rejects the selected client side with the accumulated score smaller than the score threshold; the client sides which are not selected are trained locally, and the server updates the ResNet-101 classification model according to the gradients uploaded by all the client sides which are not selected; and finally, obtaining a final ResNet-101 classification model through multiple iterations. The method and the system can screen and eliminate malicious clients which intentionally upload wrong gradients, and ensure the reliability and safety of training among multiple clients on the premise of not reducing the training accuracy.

Description

Federal learning method for eliminating malicious clients

Technical Field

The invention relates to the field of image classification, in particular to a federal learning method for eliminating malicious clients.

Background

In recent years, with the rapid development of big data and the internet of things, the data volume is increased explosively, and the artificial intelligence technology based on the big data is also developed rapidly. In mass data, the privacy protection problem of users is particularly prominent. In the training process of ordinary distributed machine learning, the local data of a plurality of clients can be directly sent to the server for unified training, and the danger of user privacy disclosure is undoubtedly increased. Moreover, due to the problems of industry competition, privacy safety, complex administrative procedures and the like, data often exists in an isolated island form, and the data of each client is difficult to directly utilize for training. Therefore, each researcher considers that the training process can not be completed without uploading the local data of the client. Accordingly, federal learning arises. The federal learning technique was first proposed by Google in 2016 and then applied to the training and deployment of next word association models for mobile device input methods. The design goal of federal learning is to train the technology of machine learning model without concentrating all data on a central server, while ensuring the safety of each user data participating in training. Therefore, the data of the client can complete the training of the model only locally, so that the problem of privacy protection is solved to a certain extent, the data of the client can be subjected to safe interactive modeling and learning under the participation of multiple clients, and ideal common benefits are achieved.

However, in the actual application environment of federal learning, a malicious client may have the following effects on the training of federal learning: affecting the performance of the model, causing privacy disclosure, disrupting the global model aggregation process, and uploading false gradients, resulting in a model that cannot converge quickly. The presence of a lazy client also creates a risk of privacy disclosure.

Therefore, how to screen the client is of great significance to improving the safety of the federal learning process and ensuring the efficiency of the federal learning process.

Disclosure of Invention

The invention aims to provide a federal learning method for eliminating malicious clients, which ensures the reliability and safety of training among multiple clients on the premise of not reducing the training accuracy.

In order to achieve the purpose, the invention provides the following scheme:

a federal learning method for eliminating malicious clients comprises the following steps:

dividing the ImageNet data set into a plurality of subsets and respectively distributing the subsets to each client as a local data set of each client;

selecting clients with a preset proportion from all the clients, sending a check command, and sending the ResNet-101 classification model of the current turn to the unselected clients;

the selected client side carries out data set distillation according to the ResNet-101 classification model and the local data set of the previous round, and a distillation data set obtained after the data set distillation is uploaded to the server;

the server scores according to the distillation data set and by combining the ResNet-101 classification model of the previous round and the gradient uploaded by the selected client in the previous round, and obtains the accumulated score of the selected client;

enabling the selected clients with the accumulated scores larger than or equal to the score threshold value to participate in next round of federal training, and rejecting the selected clients with the accumulated scores smaller than the score threshold value;

the unselected clients perform local training based on the local data sets of the clients and the ResNet-101 classification model of the current round, and upload the gradient obtained after training to the server;

the server calculates a ResNet-101 classification model of the next round according to the gradients uploaded by all the unselected clients;

and when the times of the turns reach the global iteration times, stopping the federal training to obtain a final ResNet-101 classification model.

Optionally, the selected client performs data set distillation according to the ResNet-101 classification model and the local data set of the previous round, and specifically includes:

randomly initializing learning rate eta of selected client and distillation data set composed of m data

Randomly selecting b data from a local data set D by the selected client to form a small batch D _batch ；

According to the learning rate eta and the distillation data set

Updating the parameters of the ResNet-101 classification model in the previous round by adopting a gradient descent method to obtain updated model parameters theta _upd ；

Based on small batch D _batch And updated model parameters theta _upd Using gradient descent method on distillation data set

Updating the learning rate eta;

based on the updated distillation data set and the updated model parameter θ _upd Computing cross entropy loss function

If it is

And if the maximum iteration times T are not reached, replacing the learning rate eta with the updated learning rate, and collecting distillation data>

Replacing the distillation data set with an updated distillation data set, and returning to the step of randomly selecting b data from the local data set D by the selected client to form a small batch D _batch "; wherein, end _f Is the loss function upper bound;

if it is

Or the maximum iteration time T is reached, the data set distillation is finished, and the updated distillation data set is output.

Optionally, the learning-based rate η and distillation data set

Updating the parameters of the ResNet-101 classification model in the previous round by adopting a gradient descent method to obtain updated model parameters theta _upd The method specifically comprises the following steps:

from distillation data set

By means of a formula>

Parameter theta of ResNet-101 classification model for the last round of calculation _orig A gradient of (a); in the formula (II)>

Based on a distillation data set->

And a parameter theta _orig Calculated cross entropy loss function->

Is->

With respect to parameter θ _orig Solving a gradient obtained by partial derivative;

according to the learning rate eta and the parameter theta of the ResNet-101 classification model of the previous round _orig By the formula

Calculating updated model parameters theta _upd 。

Optionally, the server performs scoring according to the distillation data set by combining the ResNet-101 classification model of the previous round and the gradient uploaded by the selected client in the previous round, to obtain a cumulative score of the selected client, which specifically includes:

the server calculates the gradient of the ResNet-101 classification model of the previous round according to the distillation data set and the ResNet-101 classification model of the previous round;

calculating cosine similarity between the gradient of the ResNet-101 classification model of the previous round and the gradient uploaded by the selected client in the previous round;

calculating the score of the selected client in the current turn according to the cosine similarity;

and adding the score of the current turn with the accumulated score before the current turn to obtain the updated accumulated score of the selected client.

Optionally, the formula for calculating the cosine similarity is

In the formula, C _k In order to be the cosine similarity, the similarity between the cosine and the cosine is calculated,

distilling data set ≥ uploaded for client k selected by the server based on the current round (tth round)>

The gradient of the ResNet-101 classification model of the t-1 th round is calculated,

gradient uploaded by the selected client k for the t-1 th round;

the calculation formula of the score of the selected client in the current turn is

In the formula, S _k，t The selected client K is scored in the t-th round, L is a scaling factor of the scoring, L is more than 0, Q is a tolerance factor, B is a critical factor, V is a completely malicious factor, K is a speed factor of the scoring, and K is more than 1.

Optionally, the calculation formula of the next round of the ResNet-101 classification model is

In the formula, theta _t+1 ResNet-101 classification model for the t +1 th round, θ _t For the ResNet-101 classification model for round t,

the gradient uploaded by the client i in the t round, alpha is the learning rate used when the gradient descent method is applied to update the model parameters, and n _i Is the local dataset size, S, of client i _init An initial cumulative score for client i, S _i For the cumulative score of the client i in the t round, S _limit Is the score threshold.

Optionally, the selecting a preset proportion of clients from all clients and sending a check command, and sending the ResNet-101 classification model of the current round to the non-selected clients further includes:

server initialization ResNet-101 classification model theta ₀ And initializing the accumulated scores of all the clients to S _init ；

If the iteration is the first iteration, classifying the ResNet-101 model theta ₀ To each client.

Alternatively, the same client may not be selected for two consecutive rounds.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention discloses a federal learning method for eliminating malicious clients, which comprises the steps of firstly, selecting clients with a preset proportion from all the clients and sending a check command, and simultaneously sending ResNet-101 classification models of the current round to unselected clients, secondly, carrying out data set distillation on the selected clients according to the ResNet-101 classification models and a local data set of the previous round, grading the selected clients by a server according to a distillation data set, and eliminating the selected clients with accumulated grades smaller than a grading threshold value; the unselected clients perform local training based on local data sets of the unselected clients and the ResNet-101 classification model of the current turn, and the server updates the ResNet-101 classification model according to gradients uploaded by all the unselected clients; and finally, obtaining a final ResNet-101 classification model through multiple iterations. The method and the system can screen and eliminate malicious clients which intentionally upload wrong gradients, and ensure the reliability and safety of training among multiple clients on the premise of not reducing the training accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a flowchart of a federal learning method for eliminating a malicious client according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a federated learning method for eliminating a malicious client according to an embodiment of the present invention;

FIG. 3 is a framework diagram of a federated learning process provided by an embodiment of the present invention;

FIG. 4 is a flow chart of data set distillation provided by an embodiment of the present invention;

fig. 5 is a flowchart of a cumulative scoring process according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The malicious client is probably a bad client with an offensive or threatening property, and the client has a great threat to the federal learning algorithm, and the privacy of the client can be leaked in a serious case. Therefore, in order to eliminate the malicious clients, the invention provides a federal learning method for eliminating the malicious clients, which mainly screens and eliminates the malicious clients which intentionally upload error gradients and the lazy clients which only acquire models but do not participate in training. The method can ensure the reliability and safety of training among multiple clients on the premise of not reducing the training accuracy.

The federal learning method for eliminating the malicious clients provided by the embodiment of the invention is characterized in that federal training is carried out on the basis of the server and the clients, each client is scored in the training process, clients meeting requirements are evaluated, and the malicious clients are eliminated. As shown in fig. 1 to 3, the method comprises the following steps:

step S1, dividing the ImageNet data set into a plurality of subsets and respectively distributing the subsets to each client as a local data set of each client.

And S2, selecting clients with a preset proportion from all the clients, sending a check command, and sending the ResNet-101 classification model of the current turn to the unselected clients.

Server initializes ResNet-101 classification model (hereinafter referred to as global model) θ ₀ And initializing the accumulated scores of all the clients to S _init In this embodiment, S is used _init ＝5。

If the iteration is the first round, the initial global model of the round is carried outType theta ₀ To each client. Otherwise, selecting clients with the proportion p from all the clients (the same client cannot be selected in two consecutive rounds), sending a check command to the selected clients, and then, selecting the global model theta of the current round _t And sending the data to other unselected clients.

And S3, distilling the data set by the selected client according to the ResNet-101 classification model and the local data set of the previous round, and uploading the distilled data set obtained by distilling the data set to a server.

And respectively distilling the data set of each selected client to obtain a distilled data set for scoring the client. It is assumed here that the parameters needed for training are: client local data set D, model parameters θ, distillation data set

Number m of distillation data, learning rate eta, small batch D _batch Small batch size b, loss function f, end condition end _f The maximum number of iterations T. Wherein, D means a data set formed by all local data of the client participating in training, and in this embodiment, the data set of ImageNet (ISLVRC 2012) is divided into a plurality of subsets and respectively allocated to each client; theta means the parameters of the ResNet-101 classification model used, where theta _orig Initial parameters of the model before the start of distillation of the data set, global model parameters in the method specific to the previous round, and theta _upd Representing the updating of the initial model parameter θ using a gradient descent method _orig Obtaining model parameters;

By representation is meant a distillation data set obtained by distillation of the data set; m means that the distillation data set->

The number of data in (1); η means controlling the extent of parameter update when updating the model parametersHyper-parameters; d _batch The meaning of the representation is small batches randomly selected from the local data set; b means the number of data in the small lot; f means a loss function adopted in model training, which is a cross entropy loss function in the embodiment; end _f Meaning, expressed, the end condition of the dataset distillation, i.e. the model parameter θ at the end of the training _upd In the distillation data set->

The upper limit of the loss function in (b), which is the loss function f (θ) of the client at the end of the previous iteration in this embodiment _t-1 ；D _k ) (ii) a T means the maximum number of iterations of the dataset distillation.

Referring to fig. 4, the detailed process of data set distillation is:

3.1 client random initial learning Rate η and distillation data set consisting of m data

3.2 the client randomly selects b data from the local data set D to form a small batch D _batch 。

3.3 based on distillation data set

Calculation of initial model parameters θ Using gradient descent method _orig And updating the gradient value to obtain an updated model parameter theta _upd . The expression of the gradient descent method is as follows:

wherein,

based on a distillation data set->

And initial model parameter θ _orig Calculating the obtained cross entropy loss function; n is the distillation data set->

The amount of data of (a); c is the total number of the labels; y is the distillation data set->

The original tag list of data x in (1); y is _i The ith original label of the data x; p is a radical of formula _θ (x) Label list of data x predicted for model with parameter θ, p _θ (x) _i The ith predictive tag for data x;

For a loss function with respect to theta _orig Solving a gradient obtained by partial derivative; theta _upd The model parameters obtained after updating by using a gradient descent method.

3.4 based on short run D _batch And updated model parameters theta _upd Calculating loss function, and using gradient descent method

And η are updated. The expression of the gradient descent method is as follows:

wherein, f (theta) _upd ；D _n ) For clients based on small batch D _batch And updated model parameter θ _upd Calculating the obtained loss function;

in relation to a loss function>

Solving a gradient obtained by partial derivative;

Calculating a partial derivative of the loss function with respect to η to obtain a gradient; lambda is the distillation data set->

And the learning rate η is a step length (learning rate) used for updating by applying a gradient descent method.

3.5 based on distillation data set

And updated model parameters theta _upd Calculating a loss function pick>

If it is satisfied with

Or a number of iterations T is reached, the distillation of the data set is concluded and a return is made to->

Otherwise, the step 1.2 is returned to carry out the next iteration.

And S4, scoring by the server according to the distillation data set and by combining the ResNet-101 classification model of the previous round and the gradient uploaded by the selected client in the previous round to obtain the accumulated score of the selected client.

And scoring each client based on the gradient and distillation data set uploaded by the client, and calculating the cumulative score of the client. Fig. 5 is a flowchart of the cumulative scoring process. It is assumed here that the parameters needed for training are: global model theta _t-1 Cumulative score S, cosine similarity C, tolerance factor Q, critical factor B, complete malicious factor V. Wherein, theta _t-1 The meaning of the representation is a global model used when the client is scored, which is an initial global model used in the previous iteration in the embodiment; s means that the client carries out accumulated scoring in multiple rounds of inspection; c means that the gradient uploaded by the client is based on the distillation data set and the initial global model theta _t-1 Cosine similarity between the obtained gradients; the expression Q means that the minimum value of the cosine similarity is determined when the client is completely normal, and the cos is 15 degrees in the embodiment; the meaning denoted by B is a critical value of the cosine similarity when the client is determined to be normal or malicious, in this embodiment, cos 30 °, and the meaning denoted by V is a maximum value of the cosine similarity when the client is determined to be completely malicious, in this embodiment, cos 90 °.

Referring to fig. 5, the cumulative scoring process is as follows:

4.1 the Server receives the distillation dataset of client k

Then, based on the global model theta of the previous round _t-1 And &>

A one-step gradient descent is performed and the gradient of the global model is calculated>

4.2 calculate the gradient and the gradient uploaded by the client in the previous round

Cosine similarity between them C _k . Cosine similarityC _k The expression of (a) is as follows: />

Wherein,

the gradient uploaded for client k in the previous round (round t-1).

4.3 calculate the round of scores S for the client _k，t . The round of scoring S _k，t The expression of (a) is as follows:

wherein L > 0 is the scaling factor for the score, and S is _k，t Is limited to [ L, -L]In this embodiment, L =5 is taken; k > 1 is a scoring speed factor, and S can be controlled in addition to preventing the limiting of the independent variable of the logarithm function _k，t K =2 in this example. S _k，t The descent speed of C _k And B, the farther the distance, the faster the descent speed. In the expression, the smaller K is, S _k，t About C _k And B the greater the acceleration of the distance between them.

4.4 updating the cumulative score S for this client _k . Cumulative score S _k The expression of (a) is as follows:

S _k ＝S _k +S _k，t (8)

and S5, enabling the selected client with the accumulated score larger than or equal to the score threshold to participate in next round of federal training, and rejecting the selected client with the accumulated score smaller than the score threshold.

The malicious clients with the accumulated scores smaller than the score threshold value and intentionally uploading error gradients can be removed, the interference of the malicious clients on the federal learning process is reduced, the accuracy of the federal learning algorithm can be further improved, and the model can quickly reach the convergence state.

And S6, locally training the unselected clients based on the local data sets of the unselected clients and the ResNet-101 classification model of the current turn, and uploading the gradients obtained after training to the server.

After the non-selected client receives the global model sent by the server, E times of local training is carried out based on local data held by the client and the global model; and E is the number of times that the unselected clients in each round update the model parameters locally by using a gradient descent method.

And S7, the server calculates the ResNet-101 classification model of the next round according to the gradients uploaded by all the unselected clients.

The calculation formula of the ResNet-101 classification model of the next round is

In the formula, theta _t+1 ResNet-101 classification model for round t +1, θ _t For the ResNet-101 classification model for round t,

gradient uploaded by the client i in the round t, alpha is learning rate used when updating model parameters by applying gradient descent method, and n _i Is the local dataset size, S, of client i _init Initial cumulative score for client i, S _i For client i in the cumulative score of round t, S _limit Is the score threshold.

And S8, stopping the federal training when the number of times of the turns reaches the global iteration number, and obtaining a final ResNet-101 classification model.

In the training process, a lazy client which does not perform model training but acquires the global model all the time may exist. To achieve this, lazy clients tend to send a random gradient close to 0. Because such clients cannot predict the correct update direction of the model, the random gradient uploaded by them is likely to deviate from the correct direction. If the scoring rule is set to be strict (in this embodiment, the score is set to be positive when the angle between the two gradients is within 30 degrees), the score is negative with a high probability. In addition, if the upload gradient is 0 then the score is directly negative. Therefore, the method also has a certain removing effect on the lazy client and can prevent the client from eavesdropping to a certain extent.

The method comprises the steps of firstly requiring each client to record a global model of the previous round and a loss function obtained when the last local update is carried out in the round in the training process of the client. Secondly, when each round starts, the server randomly selects a certain proportion of clients from the clients participating in training, and requires that the clients do not perform next round of updating, but needs to perform data set distillation based on the global model and the local data set of the previous round, and the termination condition is that the loss function is smaller than that of the client when the previous round ends. And then, the server calculates the gradient of the model by using the distillation data set uploaded by the selected client, calculates the cosine similarity according to the gradient and the gradient uploaded by the client in the previous round, and updates the accumulated score of the client. If the accumulated score is smaller than a specified threshold value, the client is prohibited (rejected) from performing subsequent federal learning, and the global model is updated only by using the gradient uploaded by the client which meets the evaluation result until the model converges. The method can not only eliminate malicious clients which upload error gradients intentionally and lazy clients which only acquire the model but do not participate in training, but also ensure that the training accuracy is not reduced to a certain extent, and can effectively guarantee the safety of the federal learning process.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principle and the implementation of the present invention are explained by applying specific examples in the embodiment, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A federal learning method for eliminating malicious clients is characterized by comprising the following steps:

enabling the selected client with the accumulated score larger than or equal to the score threshold to participate in next round of federal training, and rejecting the selected client with the accumulated score smaller than the score threshold;

the unselected clients perform local training based on local data sets of the clients and the ResNet-101 classification model of the current turn, and upload gradients obtained after training to the server;

2. The federal learning method for eliminating malicious clients as claimed in claim 1, wherein the selected clients perform data set distillation according to the ResNet-101 classification model of the previous round and a local data set, and specifically comprises:

From the learning rate eta and the distillation data set

Based on small batches D _batch And updated model parameters theta _upd Using gradient descent method on distillation data set

Updating the learning rate eta;

If it is

Replace with the updated distillation data set and return to the selected customerRandomly selecting b data from a local data set D by the terminal to form a small batch D _batch "; therein, end _f Is the loss function upper bound;

if it is

3. The federated learning method of eliminating malicious clients as claimed in claim 2, wherein the learning-by-learning rate η and distillation data set

from distillation data set

By means of a formula>

Based on a distillation data set ÷ for a selected client>

And a parameter theta _orig Calculated cross entropy loss function->

Is->

Calculating updated model parameters theta _upd 。

4. The federal learning method for eliminating malicious clients as claimed in claim 1, wherein the server scores according to the distillation data set and by combining with the ResNet-101 classification model of the previous round and the gradient uploaded by the selected client in the previous round, to obtain the cumulative score of the selected client, specifically comprises:

calculating the grade of the selected client in the current turn according to the cosine similarity;

5. The federal learning method for eliminating malicious clients as claimed in claim 4, wherein the calculation formula of the cosine similarity is

In the formula, C _k The similarity between the two signals is a cosine similarity,

distilling data set uploaded by a server based on a client k selected in the tth round->

gradient uploaded by the selected client k for the t-1 th round;

In the formula, S _k,t The selected client K is scored in the t-th round, L is a scaling factor of the scoring, L is more than 0, Q is a tolerance factor, B is a critical factor, V is a completely malicious factor, K is a speed factor of the scoring, and K is more than 1.

6. The federal learning method for eliminating malicious clients as claimed in claim 1, wherein the calculation formula of the ResNet-101 classification model of the next round is

gradient uploaded by the client i in the t round, alpha is learning rate used when updating model parameters by using gradient descent method, and n _i Local dataset size, S, for client i _init For client iInitial cumulative score of, S _i For the cumulative score of the client i in the t round, S _limit Is the score threshold.

7. The federal learning method for eliminating malicious clients as claimed in claim 1, wherein the method further comprises the steps of selecting clients with a preset ratio from all clients and sending a check command, and sending the ResNet-101 classification model of the current turn to the non-selected clients, wherein the method further comprises the following steps:

8. The federal learning method for eliminating malicious clients as claimed in claim 1, wherein the same client cannot be selected in two consecutive rounds.